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DEPARTMENT OF COMMERCE PATENT AND TRADEMARK OFFICE 



TRANSMITTAL LETTER TO THE UNITED STATES 
DESIGNATED/ELECTED OFFICE (DO/EO/US) 
CONCERNING A FILING UNDER 35 U.S.C. 371 



PRIORITY DATE CLAIMED 



TITLE OF INVENTION 

MOLECULES FOR DISEASE DETECTION AND TREATMENT 



APPL1CANT(S) FOR DO/EO/US 

INCYTE GENOMICS, INC.; HODGSON, David M.; LINCOLN, Stephen E.; RUSSO, Pank D.; SFIRO, Peter A.; 
BANVILLE, Steven C; BRATCHER, Shaw R.; DUFOUR, Gerard E.; COHEN, Howard J.; ROSEN, Bruce H. 
CHALUP, Michael S.; HILLMAN Jennifer L.; JONES, Anissa L.; YU, Jimmy Y.; GREENAWALT, Lila B,; BVNZER, 
Scott R,; ROSEBERRY, Ann M. WRIGHT, Rachel J.; DANIELS, Susan E. 



Applicant herewith submits to the United States Designated/Elected Office (DO/EO/US) the following items and 
other information: 

1 . isi This is the FIRST submission of items concerning a filing under 35 U.S.C. 371 . 

2. □ This is a SECOND or SUBSEQUENT submission of items concerning a filing under 35 U.S.C. 37L 

3. □ This is an express request to promptly begin national examination procedures (35 U.S.C. 371 (f)). 

4. □ The US has been elected by the expiration of 19 months from the priority date (PCT Article 3 1). 

5. la A copy ofthe International Application as filed (35 U.S.C. 371(c)(2)) 

a. □ is attached hereto (required only if not communicated by the International Bureau) 

b. □ has been communicated by the International Bureau. 

c. la is not required, as the application was filed in the United States Receiving Office (RO/US). 

6. □ An English language translation of the International Application as filed (35 U.S.C. 371(c)(2)). 

7. la Amendments to the claims of the International Application under PCT Article 19 (35 U.S.C. 371(c)(3)) 

a. □ are attached hereto (required only if not communicated by the International Bureau). 

b. □ have been communicated by the International Bureau. 

c. □ have not been made; however, the time limit for making such amendments has NOT expired. 

d. H have not been made and will not be made. 

8. o An English language translation of the amendments to the claims under PCT Article 19 (35 U.S.C. 37 1 (c)(3)> 

9. s An oath or declaration oftheinventor(s) (35 U.S.C. 371(c)(4)). 

10. D An English language translation of the annexes to the International Preliminary Examination Report under 
PCT Article 36 (35 U.S.C. 371(c)(5)). 



Items 11 to 16 below concern document(s) or information included: 

11. □ An Information Disclosure Statement under 37 CFR 1.97 and 1.98. 

12. EI An assignment document for recording. A separate cover sheet in compliance with 37 CFR 3.27 and 3.31 isinchded. 

13. a A FIRST preliminary amendment, as follpws: Cancel in this application original claims 21 thru 56 before 
calculating the filing fee, without prejudice or disclaimer. Applicants submit that these claims were included in the 
application as filed in the interest of providing notice to the public of certain specific subject matter intended to be 
claimed, and are being canceled at this time in the interest of reducing filing costs. Applicants expressly state that 
these claims are not being canceled for reasons related to patentability, and are in fact fully supported by the 
specification as filed. Applicants expressly reserve the right to reinstate these claims or to add other claims during 
prosecution of this application or a continuation or divisional application. Applicants expressly do not disclaim the 
subject matter of any invention disclosed herein which is not set forth in the instantly filed claims 

□ A SECOND or SUBSEQUENT preliminaiy amendment. 

14. □ A substitute specification. 

15. □ A change of power of attorney and/or address letter, 

16. H Other items or information: 

1) Transmittal Letter (2 pp, in duplicate) 

2) Return Postcard 

3) Express Mail Label No.: EL 697 344 303 US 

4) Article 34 Amendment 
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U.S. APPLIf-ATIO\ NOj(if knqwD,see.37 Q^R US) 


INTERNATIONAL APPUCATION NO.: 
PCT/USOO/15344 


ATTORNEY'S DOCKET NUMBER 
PT-I042 USN 



1 7. H The foflowing fees are subiritted: 

BASIC NATIONAL FEE (37 CFR 1.492(aXl)-(5): 

Neither intemationalpreKminary examination fee (37 CIR 1.482) 
nor intemationalsearch fee (37 CIR 1 .445(a)(2)) paid to USPX) 

and Mtemational Search Report not prepared bjtiie EPO or JPO $1000.00 

□International preliminary examination fee (37 CFR. 1.482) not paid to 

U SPTO but International Search Report prepared byflie EPO or JPO..$860.00 

IntemationalpreKminary examination fee (37 CIR 1.482) not paid to USPTD 

but intemationalsearch fee (37 CIR 1.445(a)(2)) paid to USPTD $710.00 

silnternational prelininary examination fee paid to USPTD (37 CFR 1.482) 

but aU claims did not satisfy provisions of PCT Article 33(l)-(4) $710.00 

□ International preliminary examination fee paid to USPTD (37 CFR 1.482) 
and afl claims satisfied provisions of PCT Article 33(l)-<4) $100.00 



ENTER APPROPRIATE BASIC FEE AMOUNT = 



Surcharge of $130.00 for furnishing the oath or deciration hter than o 20 0 30 
months from tie earliest cbimed priority date (37 CFR 1 .492(e)). 



NUMBER FILED 



NUMBER EXTRA 



Independent Ckims 



MULTIPLE DEPENDENT CLAIM(S) (if applcable) 



TOTAL OF ABOVE CALCULATIONS = 



SUBTOTAL = 



TOTAL NATIONAL FEE = 



Fee for recording flieencbsed assignment (37 CFR 1.21(h)). The assigmrent must be 
accompanied bythe appropriate cowr sheet (37 CFR 3.28, 3.3 1). $40.00 per property + 



TOTAL FEES ENCLOSED 



a. D A check in the anuunt of $ to cover the above fees is encbsed. 

b. H Please chaiige ny Deposit Account No. 09^)108 in the amount of $710.00 to cover the abo\e fees. 

c. H The Commissioner is herebyautliorized to charge anyadditional fees which nay be required, or credit any 

overpayment to Deposit Account No. 09-0108 . A duplicate copyof fliis sheet is encbsed. 

NOTE: Where an appropriate tine Umh under 37 CFR 1.494 or 1.495has not been met, a petition to revive (37 CHI 1.137(a) or (b» m 
be filed and granted to restore the appUcation to pencUng status. 

SEND ALL CORRESPONDENCE TO: 



INCYTE GENOMICS, INC. 
3160 Porter Drive 
Palo Alto, CA 94304 



NAME: Diana Haniet-Cox 
REGISTRATION NUMBER: 33,302 



10/009416 



30 NOV 2001 

Docket No.: PT-1042 USN 

"Express Mail" mailing label numbe r EL 697 344 303 US. I hereby certify that 

this document and referenced attachments are being deposited wth the United 

States Postal Service "Ei^ress Mail Post Office to Addressee" service under 

37 CFR § 1.10, addressed to: Conmissioner for Patents, ^ 

Box Patent Application, P.O. Box 2327, Arlington, VA 22202 o n '^Q November 2001 . 

Byy^ ^^!M:^<^/^ Printed: J\>^^ ^^^rfN g>S 

IN THE UNITED STATES PATENT AND TRADEMARK OFFICE 

In re Application of: David M. Hodgson, Stephen E. Lincoln, Frank D. Russo, Peter A. 

Spiro, Steven C. Banville, Shawn R. Bratcher, Gerard E. Dufour, 
Howard J. Cohen, Bruce H. Rosen, Michael S. Chalup, Jennifer L. 
Hillman, Anissa L. Jones, Jimmy Y. Yu, Lila B. Greenawalt, Scott R. 
Panzer, Ann M. Roseberry, Rachel J. Wright & Susan E. Daniels 

Title: MOLECULES FOR DISEASES DETECTION AND TREATMENT 

PCT Serial No.: PCT/USOO/15344 hitemational Filing Date: 01 June 2000 

Examiner: To Be Assigned Group Art Unit: To Be Assigned 



Assistant Commissioner for Patents 
Box Patent Application 
P.O. Box 2327 
Arlington, Virginia 22202 

SUBMISSION UNDER 37 CFR § 1.821-1.825 SEQUENCE LISTING 

Sir: 

In accordance with the requirements of 37 CFR § 1.821-1.825, Applicants hereby submit 
one (I) diskette(s) containing the computer-readable information for the Sequence Listing of the 
above-identified application. The content of the Sequence Listing paper copy is identical to the 
computer-readable copy filed with the US Receiving Office. The USPTO is authorized to add 
whatever is necessary to update the CRF with the current application information. 

Respectfully submitted, 
INCYTE GENOMICS, INC. 

Date: 30 November 2001 



Reg. No. 33,302 

Direct Dial Telephone: (650) 845-4639 

3160 Porter Drive 
Palo Alto, California 94304 
Phone: (650) 855-0555 
Fax: (650) 845-4166 



Diana Hamlet-Cox X 
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PCTIO 



RAW SEQUENCE LISTING DATE: 01/14/2002 

PATENT APPLICATION: US/10/009,416 TIME: 11:48:12 

Input Set : A:\ipptl042usn.app.txt 



Input Set : A:\ipptl042usn.app.txt 
Output Set: N:\CRF3\01142002\J009416.l 



4 



5 <110> APPLICANT: INCYTE GENOMICS, INC. 

6 Hodgson, David M. 

7 Lincoln, Stephen E. 

8 Russo, Frank D. 

9 Spire, Peter A. 

10 Banville, Steven C. 

11 Bratcher, Shawn R. 

12 Dufour, Gerard E. 

13 Cohen, Howard J. 

14 Rosen, Bruce H. 

15 Chalup, Michael S. 

16 Hillman, Jennifer L. 

17 Jones, Anissa L. ^ ■ mmo b«°^ ■«=• 
giB Yu, JinunyY. ENTtRcU 
C3 19 Greenawalt, Lila B. 

p 20 Panzer, Scott R. 

iQ 21 Roseberry, Ann M. 

'5 22 Wright, Rachel J. 

PT 23 Daniels, Susan E. 

^1:26 <120> TITLE OF INVENTION: MOLECULES FOR DISEASE DETECTION AND TREATMENT 

^^'29 <130> FILE REFERENCE: PT-1042 PCT 
C--^ 31 <140> CURRENT APPLICATION NUMBER: US/10/Q09,416 
C--M33 <141> CURRENT FILING DATE: 2001-11-30 

U35 <150> PRIOR APPLICATION NUMBER: US 60/137,412; 60/147,500; 60/147,501; 60/147,542 
W--|!l|36 <151> prior FILING DATE: 1999-06-03; 1999-08-05; 1999-08-05; 1999-08-05 

p39 <160> NUMBER OF SEQ ID NOS : 14 

?^41 <170> SOFTWARE: PERL Program 

?^43 <210> SEQ ID NO: 1 

^°°44 <211> LENGTH: 3101 

45 <212> TYPE: DNA 

46 <213> ORGANISM: Homo sapiens 

48 <220> FEATURE: 

49 <221> NAME/KEY: misc_feature 

50 <223> OTHER INFORMATION: Incyte ID No: 222197.6 

52 <220> FEATURE: 

53 <221> NAME/KEY: unsure ^ 

54 <222> LOCATION: 3077, 3084, 3093, 3097-3098 

55 <223> OTHER INFORMATION: a, t, c, g, or other 
5 7 <400> SEQUENCE: 1 

58 agtgattgca tgagcttagg gaggggagtg acatgatctg atttacgtct gtggaagacc 60 

62 actctgggtg ctgcatgggg gactggactg ttgggtgagc agaggcatga gagtagagag 120 

63 aggactggtc agcaggtgat ctaagcactc cccagatccg atcacatagg acagtatgca 180 

64 ccttaagatc ctgaagaaac ggcacaaaat gttcaagtga tgtttagaaa taacttgtga 24 0 

65 gggtgcgtca gggaaatcat gcagccatca ggacacaggc tccgggacgt cgagcatcat 300 

66 cctctcctgg ctgaaaatga caactatgac tcttcatcgt cctcctcctc cgaggctgac 360 

67 gtggctgacc gggtctggtt catccgtgac ggctgcggca tgatctgtgc tgtcatgacg 420 

68 tggcttctgg tcgcctatgc agacttcgtg gtgactttcg tcatgctgct gccttccaaa 480 
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RAW SEQUENCE LISTING 

PATENT APPLICATION: US/10/009,416 



DATE: 01/14/2002 
TIME: 11:48:12 



Input Set : A:\ipptl042usn.app.txt 
Output Set: N:\CRF3\01142002\J009416.raw 



69 gacttctggt actctgtggt caacggggtc atctttaact gcttggccgt gcttgccctg 540 

70 tcatcccacc tgagaaccat gctcaccgac cctggggcag tacccaaagg aaacgctacg 600 

71 aaagaataca tggagagctt gcagctgaag cccggggaag tcatctacaa gtgccccaag 660 

72 tgctgctgta ttaaacccga gcgcgcccac cactgcagta tttgcaaaag atgtattcgg 720 

73 aaaatggatc atcactgccc gtgggtgaac aattgtgtag gagaaaagaa tcaaagattt 780 

74 tttgtgctct tcactatgta tatagctctg tcttcagtcc atgctctgat cctttgtgga 840 

75 tttcagttca tctcctgtgt ccgagggcag tggactgaat gcagtgattt ttcacctccg 900 

76 ataactgtaa tcctgttgat cttcctgtgc cttgagggtc ttctgttttt cactttcact 960 

77 gcagttatgt ttggcaccca aatccactcc atatgcaacg acgagacgga gatcgagcga 1020 

78 ttgaaaagtg agaagcccac atgggagcgg aggctgcgat gggaagggat gaagtccgtc 1080 

79 tttggggggc ccccctcact cctctggatg aatccctttg tgggcttccg atttaggcga 1140 

80 ctgcccacga gacccagaaa aggtggcccg gagttctcag tgtgaggcgt ggctcatcag 1200 

81 actgaaactt gctcacagac ttccagttat ttatttgggg tctgaaggat atcaacagct 1260 

82 catctgtgac caacagggca actggaacct acacaaacca attgcttgca gcaagcagag 1320 

83 ttttatatat ttatagtcac agatggcaga ggaagaggct ctcagtcccc acctgtacaa 1380 

84 caacggaaag gtgtgtggcc acacgaagaa gccaaacgcc gtggcctcct gcagagctgg 1440 
\J= 85 ggcttctgtg gagaatactt cgggttatta catgggttat tcaaatcctg ggtcctgagc 1500 
p 86 tgctgtttcc aatcatgaag aaaaacagtg aatccagtga acagggattc tccaagcagt 1560 
fj 87 catttcaggg ggctcctgct gaccccgcca ctcagcagtg cactccccgg atcacagcag 1620 

88 ggcgtttaca tagaaagacg ttttggtctc gattagctcc gatgctttgc gctgaagttg 1680 

89 caaaagatct gtgcactgaa cagtgaaggt ggcttccggc acactccccg ctgccccgga 1740 

90 agagacatcc tttgaccctc tcagcaagtc tgtgtgtgtg cgtgtctgtg cgtgtgcgcg 1800 
=r 91 cgtgtgtgca tgtgtgtcaa aattgccagt gttgtttagg caatgtaaca tttaccggct 1860 
f= 92 gtgtacagca aacaagctat tttttagaaa ccgacgtttc agggaagagg ggagagagcc 1920 
p 93 gcggggtcct gcccgtggtt actatgaatg tattgctgtt ggaggacatc tcgatccaaa 1980 
3 94 gaacagccgt tcctgtgcgg cccttcgttg ccctcctgct ttcatttttt aaagaaatct 2040 

95 tgagtgcttg agggccttgg aactgatttt tttttttttg ttccagccaa attagcagtg 2100 
Ij^^ 96 tataaatggc acctaggtaa gagcagagct gcggctcggt gacttgatac ttggggcagc 2160 
^ I 97 cccgatgctg tgtgtggggc aggggaggca tccttactgg agaggcaggg cccagccatt 2220 

98 gggcacctct gggaagggga ggggaccatg aggcagccag cccctggcag gggcgactgt 2280 
=== 99 gccaccgcag gcagcgctcc agtccgggaa tggccaggat ggcgccctct tgttggagtt 2340 

100 tttggttagc ttttacgttt tcttctccac ccacggcaca ggtgataaaa taggatcctt 2400 

101 ggtgcggact ttaaaattat gccagaaagc caacagctcc cctcgtgggg ccttgcctta 2460 

102 aacttgcctg gtttgtacat tttttgccgg acgcatcaag aagcaatctg tgacaaagtc 2520 
10 3 tgagggtctt cctttatgct tgccctccac actaagagaa gttggcgtct ccctcctggg 2 580 

104 aattgttttg cctttctgtt catctgtgaa ctgttttttg tttttaatta ctctgtaccc 2640 

105 catccgaatc agggcttcta ccactgctga tgcaaaacca caaagggacc tacctgagcc 2700 

106 accgtcctag ccaagcgagc aaacctgcag ggggtttgga agtggacttg gtcaccgcag 2760 

107 aagcgtgtgc gccgttgggg gaagagctgc gtcacagcca gagggacaaa gtgtgggtga 2820 

108 tcctggagac gccagtttcc gagattgttc tgcatattca tttgcacatt gttgtctggg 2880 

109 ttggacatgc gtgtgggctt cagtgtgagg cttttaatat gtatatcctg ttatcaataa 2940 

110 aacaattatc caagtggttg aatcctgtga gacttggcaa gtgtgtgcaa atcaagtata 3000 

111 cttgactttt caacctcttc tttcaatgta acttttatat gaaataaagt aatcaattaa 3060 
O- 112 cagttctcaa aaaaaanaaa gg-gnggccgc cgnctannga g 3101 

114 <210> SEQ ID NO: 2 

115 <211> LENGTH: 2561 

116 <212> TYPE: DNA 

117 <213> ORGANISM: Homo sapiens 
119 <220> FEATURE: 
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RAW SEQUENCE LISTING DATE: 01/14/2002 

PATENT APPLICATION: US/10/009,416 TIME: 11:48:12 



Input Set : A:\ipptI042usn.app.txt 
Output Set: N:\CRr3\01142002\J009416.raw 

120 <221> NAME/KEY: misc_feature 

121 <223> OTHER INFORMATION: Incyte ID No: 227709.3 

123 <220> FEATURE: 

124 <221> NAME/KEY: unsure 

125 <222> LOCATION: 126, 2144 

126 <223> OTHER INFORMATION: a, t, c, g, or Other 

128 <400> SEQUENCE: 2 

129 gcgggcgcgt cgccctctgc ccccgccggc accctggcca tgacaggcaa gtcggtgaag 60 

130 gacgtggatc ggtaccaggc tgtcctggcc aacctgctgc tggaggagga taacaagttt 120 
^ndr> 131 tgtgcngatt gccagtctaa agggccgcga tgggcctctt ggaacattgg tgtgttcatc 180 

132 tgcattcgat gtgctggaat ccacaggaat ctgggggtgc acatatccag ggtaaagtca 240 

133 gttaacctcg accagtggac tcaagaacag attcagtgca tgcaagagat gggaaatgga 300 

134 aaggcaaacc gactttatga agcctatctt cctgagacct ttcggcgacc tcagatagac 360 

135 ccagctgttg aaggatttat tcgagacaaa tatgagaaga agaaatacat ggaccgaagt 420 

136 ctggacatca atgcctttag gaaagaaaaa gatgacaagt ggaaaagagg gagcgaacca 480 

137 gttccagaaa aaaaattgga acctgttgtt tttgagaagg tgaaaatgcc acagaaaaaa 540 

138 gaagacccac agctacctcg gaaaagctcc ccgaaatcca cagcgcctgt catggatttg 600 
r;^ 139 ttgggccttg atgctcctgt ggcctgctcc attgcaaata gtaagaccag caatacccta 660 
y 140 gagaaggatt tagatctgtt ggcctctgtt ccatcccctt cttcttcggg ttccagaaag 720 
O 141 gttgtaggtt ccatgccaac tgcagggagt gccggctctg ttcctgaaaa tctgaacctg 7 80 
p 142 tttccggagc cagggagcaa atcagaagaa ataggcaaga aacagctctc taaagactcc 840 
^rj 143 attctttcac tgtatggatc ccagacgcct caaatgccta ctcaagcaat gttcatggct 900 
j- 144 cccgctcaga tggcatatcc cacagcctac cccagcttcc ccggggttac acctcctaac 960 
lI agcataatgg ggagcatgat gcctccacca gtaggcatgg ttgctcagcc aggagcttct 1020 

146 gggatggttg cccccatggc catgcctgca ggctatatgg gtggcatgca ggcatcaatg 1080 

147 atgggtgtgc cgaatggaat gatgaccacc cagcaggctg gctacatggc aggcatggca 1140 
= 148 gctatgcccc agactgtgta tggggtccag ccagctcagc agctgcaatg gaaccttact 1200 
h"^ 149 cagatgaccc agcagatggc tgggatgaac ttctatggag ccaatggcat gatgaactat 1260 
5=^^ 150 ggacagtcaa tgagtggcgg aaatggacag gcagcaaatc agactctcag tcctcagatg 1320 
[y 151 tggaaataaa aacaaaacac ctgtatggct gccattctct tcagccctgc gctctcccct 1380 
rj 152 ttccacagcc tccacccctg acccccatcc tcttttccta cctctctgtt tggtttagaa 1440 

153 attgctcaat aagtcatttg gggtttggca tcctgcccag ccacttccca aacatgaaga 1500 
r: 154 cctctctgtt gctttatgtt gtacatgccc catagccatc ccaacgtcct ccccagtcct 1560 

155 ctcctggcac cagcacctta gaagttgttg gcagaaggca cttaaactgt gggagaagtg 1620 

156 tgcacacctt tgagtccctt ccctcaaggt taaagctcct gtcagactct cagaagggtc 1680 

157 tgtgggtgtt gtatattagg caaacagggg aaagcttaga ggtccttcta tatgtgttaa 1740 

158 taagctgttt ctaagtgttt aaatttgaaa agcatcatgt tctcatgatt tatgggaatg 1800 

159 aagcaagtac tgaaatcaaa ttaaatactc cctgggtcct gggtcagttt gaccctagcc 1860 

160 ctggggtgag gcaagccccc tcctatgagg atgagcaaaa atactactct cttcgccctg 1920 

161 agttgctttc tggatctggg gcttcaggac ttgctgcttc agtcagcctt tattagcacc 19 80 

162 aaagacttta tgaagatccc acacacagac acacatccct tcccgcctcc cccctgcctt 2040 
1^ 163 cagtaggatc tggctccgtg gctggaggac caacccctat agtgggaatg cagagcttaa 2100 

wA^V 164 cgtgtactgc ttgtgtgtgt gcgtgaagtg tgtgtgtgtg taanaagtgt gtgttccgcc 2160 

165 tcccaccctc tccccatctg ctctgggtat ttttgttttt gtttagtttt aggtttacaa 2220 

166 cagagaggaa ttaatttatc agcagcctaa aactgttgtg tttttcttat ggtttaaaaa 2280 

167 acgccatgtc attgataact ccctttctcc cttcccttct cccggtctgc tgatcactct 2340 

168 ttcatgcctg tgtatccagg gtgctctgtt tccccaccgt tcccaggtgt acgaggcaga 2400 

169 gggccgggac agctttcctc tcagtcattg ttcaccccac ttgaaaattc agacaagaaa 2460 

170 actttgctta aaagatttca tgtgtgggaa ccacagttcc tggctgcctt tctcctgtgt 2520 
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RAW SEQUENCE LISTING DATE: 01/14/2002 

PATENT APPLICATION: US/10/009,416 TIME: 11:48:12 



Input Set : A:\ipptl042usn.app.txt 
Output Set: N:\CRF3\01142002\J009416.raw 

171 atgtgtaaat tccttaataa atattgcagg gaaggactgt t 2561 

173 <210> SEQ ID NO: 3 

174 <211> LENGTH: 2710 

175 <212> TYPE: DNA 

176 <213> ORGANISM: Homo sapiens 

178 <220> FEATURE: 

179 <221> NAME/KEY: inisc_f eature 

180 <223> OTHER INFORMATION: Incyte ID No: 237703.2 

182 <220> FEATURE: 

183 <221> NAME/KEY: unsure " /' 

184 <222> LOCATION: 712, 7^9, 2332, 2334, 2342, 2470, 2611, 2682 

185 <223> OTHER INFORMATION: a, t, c, g, or other 

187 <400> SEQUENCE: 3 

188 caggtacctt attacattat tttatttaat catcatgtat ccctataagt aatattcttc 60 

189 tcttttatag agtaagaaat tgagattcag gaatattaat ttgcccagga tcatcgaagt 120 

190 ggaatgaaca tcaaaagcct attccctctg cttggccact tccacctcat tttactaagt 180 

191 ttccccatgt ctgtgttagt aaactaataa ctaaaagggt ctcgcatttt aaatagcttt 240 

192 ttaacccaag agcatgccac atttaaccag aggcccatag aacaaactga aaattacaac 300 
==.■& 193 ctaaaaggtt gtttctaagg ttgtattgag aaggaattga gctcttgaat ccctagaatt 360 

194 ccttattaat actttattct tctgttaaaa gttttatttt taaaagtttc atacagtgtg 420 
'pi 195 tatattggtg tgataatcct acagaaaaat caagcagtta tgttttcttc acagataaca 480 
j!" 196 cataaaatat taaacagaaa gcctatgtta ttcattggac tgaagctttt atgcaataaa 540 

197 ccttagttgg accaggagta aatgtatggt ttgatattca gagaatctca ttcttagaag 600 

198 caacaaagtg tagttaacac taacttgttc attcttaaat cagtagtcct ctcctcccca 660 
/-lA-4^> 199 aaaagagatc ttaaattatt ttcatttaaa gtcatctact aacaagtaag tntttattca 720 
l\ \^ 200 acttaattaa atctaacacc acaagacaat tttgttttag ttattgtttt ggtttgagtt 780 
viii^lp 201 gagttgaaag atttctttnt tttcttctca gcttaccaca gtgcggagac tgctttctga 84 0 

= 202 aaaggccact cacgtgaaca ctagggatga agatgagtat acccctcttc atcgagcagc 900 
j-jL 203 ctacagtgga cacttagata ttgttcagga gctcattgca cagggggccg atgttcatgc 960 

204 agtgactgtg gatggctgga cgcccctgca cagtgcttgt aagtggaata atcccagagt 102 0 

205 ggcttctttc ttactgcagc atgatgcaga tatcaatgcc caaacaaaag gcctcttgac 1080 
Zl 206 ccccttgcat cttgctgctg ggaacagaga cagcaaggat accctagaac tcctcctgat 1140 
S gaaccgttac gtcaaaccag ggctgaaaaa caacttggaa gaaactgcat ttgatattgc 1200 
^ 20 8 caggaggaca agtatctatc actacctctt tgaaattgtg gaaggctgta caaattcttc 12 60 
h== 209 acctcagtct taacaattct agtaattttc ctaagtttct aaataccagt gcctcctgtg 1320 

210 tgtgagatgt attcccataa tcaaagttga cgtcaaacat cttactacaa aaattcagtg 1380 

211 acattcatta taacattctt ccaagtgaat tgcctgactt tgatgtcaaa atgtatttga 1440 

212 aagtaatttg catatatctt taattatttc tgtggagttt gtgatttttt tatcagaaat 1500 

213 aattttaatg tgtgtatact taaaaacttg acacgggttg tacagaaact ggtatttttg 1560 

214 gtgctgatac aagagaaatg tatttttaaa tatcccacat cctggatctt tgttgggtat 1620 

215 ttagtatatt gacatatatt tttataaggt gaggtaactc agaacttaat ttaaaagtct 1680 

216 taaatattct gatacaattc agctgtcttc tctaccttac catagccagt tgctttcatt 1740 

217 ttaaaccaga gcaagtaaca tattagtgac ttgaatcttc ataagttaaa gtaaaaaaca 1800 

218 gcaaaaaacc tagatctttg tcttttagaa cacagaccat tttcaggaaa gcagttagct 1860 

219 aagtgtttaa ttcatgaata ttgtatactg catcccctac cacaatttac acaatcctgt 1920 

220 ggatagtcct acctcaccct ggtcaaccta catgatcctt aagctaatgg cgaatcacga 1980 

221 tgaccttgta gacatgcaca caactatacc tttgtccaac agatcataat atatctgcta 2040 

222 tccaactggt tttacctgcc taatcctact gatttgggca ctgcttgtat agtctctcaa 2100 

223 gttcacagga aatgttgatt ttctaaggtc ctcattttta cagagtatac aggcaaagtg 2160 
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RAW SEQUENCE LISTING DATE: 01/14/2002 

PATENT APPLICATION: US/10/009,416 TIME: 11:48:12 

Input Set : A:\ipptl042usn.app.txt 
Output Set: N:\CRF3\01142002\J009416.raw 

224 acaggggaaa aggaattagt ctaagagtaa ggggatgatt attatattga ggctaaaacc 2220 

225 acaaagtggc tcaggcttta aaaaaaaaac actgtggata atgacaaaaa gcataagtaa 2280 
W--> 226 aaatatttga gaaaaataaa gtacaagttt tgaacaacac aaaaggcatg aritncatttt 2340 
W--> 227 taiacctgtgt atgtctttct tggatccaga acattattca tccagcacgc acttagttat 2400 

,/ 228 ttaacatcta , ctcactcagt ctctccagca gcaatttttg cattgtctat ctagcccctt 2460 
^W^Mj[> 229 tgtgattgtn cccaaagttt tgtcttctca acaccacaac actccagggg aagggaacta 2520 
Y \ 230 aaccagttgc tctttacttc agttaaattt ttaagatgtc caccaaggct tatctctttc 2580 
w^-> 231 aagccatcct acgtaaccca gtcaccctag nctaagtaat aatgttattt aatcaaaggt 2640 
W--> 232 taaatattta tttttgctta gaacttatta gatcatctca gnaaaagtca gaggtaatat 2700 
233 ttgggcctgg 2710 

235 <210> SEQ ID NO: 4 

236 <211> LENGTH: 2059 

237 <212> TYPE: DNA 

238 <213> ORGANISM: Homo sapiens 

240 <220> FEATURE: 

241 <221> NAME/KEY: misc_feature 

242 <223> OTHER INFORMATION: Incyte ID No: 240091.1 

lI 244 <220> FEATURE: 

245 <221> NAME/KEY: unsure 
5 246 <222> LOCATION: 1850 

Q 247 <223> OTHER INFORMATION: a, t, c, g, or other 
dj 249 <400> SEQUENCE: 4 

250 cgcgctccgg acctggcagg cggcggctgc agggcaggtc caggggccac atggctgagg 60 

251 gggacgcagg gagcgaccag aggcagaatg aggaaattga agcaatggca gccatttatg 120 
fif: 252 gcgaggagtg gtgtgtcatt gatgactgtg ccaaaatatt ttgtattaga attagcgacg 180 
'y 253 atatagatga ccccaaatgg acactttgct tgcaggtgat gctgccgaat gaatacccag 240 

254 gtacagctcc acctatctac cagttgaatg ctccttggct taaagggcaa gaacgtgcgg 300 
='7 255 atttatcaaa tagccttgag gaaatatata ttcagaatat cggtgaaagt attctttacc 360 

25 6 tgtgggtgga gaaaataaga gatgttctta tacaaaaatc tcagatgaca gaaccaggcc 420 
Pi 257 cagatgtaaa gaagaaaact gaagaggaag atgttgaatg tgaagatgat ctcattttag 480 
[13 2 58 catgtcagcc ggaaagttcg gttaaagcat tggattttga tatcagtgaa actcggacag 54 0 
r3 259 aagtagaagt agaagaatta cctccgattg atcatggcat tcctattaca gaccgaagaa 600 

260 gtacttttca ggcacacttg gctccagtgg tttgtcccaa acaggtgaaa atggttcttt 660 

261 ccaaattgta tgagaataag aaaatagcta gtgccaccca caacatctat gcctacagaa 720 
2 62 tatattgtga ggataaacag accttcttac aggattgtga ggatgatggg gaaacagcag 780 
2 63 ctggtgggcg tcttcttcat ctcatggaga ttttgaatgt gaagaatgtc atggtggtag 840 

264 tatcacgctg gtatggaggg attctgctag gaccagatcg ctttaaacat atcaacaact 900 

265 gtgccagaaa catactagtg gaaaagaact acacaaattc acctgaggag tcatctaagg 960 

266 ctttgggaaa gaacaaaaaa gtaagaaaag acaagaagag gaatgaacat taatacctga 1020 

267 aactatagga aaggttaatt tgcctataat tatatataca ttccatagtc atcaaggaat 1080 

268 atattgtgca gagagagtat ccttgactgc ttaagtcagc cagttcagca tggataccaa 1140 

269 cattagcttt tcttcttggt tatatcatct gccaaaaata gagaacttat gatctattca 1200 

270 tgtgtgtttc aggcttattt gggagaacta atttgaactt aatcaccact tcatctaatt 1260 

271 ttagcaaggt aacagttgcc cagggcagta cctgaattaa ctgtccattt cagtacatgt 1320 
2 72 caagtgcctt tgttaggtgg agaagaaatg tctctagagg aatataaata cctgatttct 1380 
2 73 tgtcatcgag attcttgtac tgttaaatga atattgcctt ttactgctct ttatggctta 1440 

274 ttggaatagg agctcattta agattgatct tggagagttt cttcttgtga ttttagttca 1500 

275 taagtatgtc acctttcatt ttatagtgtt catcattgag taatggatta agtgaaaatc 1560 

276 caggagtatc catctgcagt tatgtgctga ggtgataatt catccaacat atttgttagc 1620 
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VERIFICATION SUMMARY DATE: 01/14/2002 

PATENT APPLICATION: US/10/009,416 TIME: 11:48:13 

Input Set : A:\ipptl042usn.app.txt 
Output Set: N:\CRF3\01142002\J009416.raw 

L:31 M:270 C: Current Application Number differs, Replaced Current Application Number 
L:33 M:271 C: Current Filing Date differs. Replaced Current Filing Date 
L:36 M:256 W: Invalid Numeric Header Field, Wrong Prior FILING DATE: YYYY-MM-DD 
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<110> INCYTE GENOMICS, INC. 
Hodgson, David M. 
Lincoln, Stephen E. 
Russo, Frank D. 
Spiro, Peter A. 
Banville, Steven C. 
Bratcher, Shawn R. 
Dufour, Gerard £. 
Cohen, Howard J. 
Rosen, Bruce H. 
Chalup, Michael S. 
Hillman, Jennifer L. 
Jones, Anissa L. 
Yu, Jimmy Y. 
Greenawalt, Lila B. 
Panzer, Scott R. 
Roseberry, Ann M. 
VJrighC, Rachel J. 
Daniels, Susan S. 



<12 0> MOLECULES FOR DISEASE DETECTION AXD TREATMENT 



<13C> PT-1042 PCT 
<140> To Be Assigned 
<141> Herewith 

<150> US 60/137,412; 60/147.500; 60/147,501; 60/147 542 
<151> 1999-06-03; 1999-08-05; 1999-0B-05; 199S-OS-05 

<160> 14 

<170> PERL Program 

<210> 1 

<211> 3101 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> iiiisc_f eature 

<223> Incyte ID No: 222197.6 

<220> 

<221> unsure 

<222> 3077, 3084, 3093, 3097-3098 
<223> a, t, c, g, or other 



<400> 1 

agtgattgca tgagcttagg gaggggagtg acatgatctg atttacgtct gtggaagacc 60 



wo 00/75298 



PCT/USOO/15344 



accccgggcg ctgcatgggg gactggaccg 
aggactggcc agcaggrgac cnaagcactc 
cct:T:aagacc crgaagaaac ggcacaaaat: 
gggtgcgtca gggaaatcat gcagccacca 
cctccccrgg ccgaaaacga caaccacgac 
gtggctgacc gggcctggtc cacccgtgac 
tggctcctgg tcgccratgc agaccccgtg 
gacctcrggr actccgrggt caacggggcc 
ccatcccacc cgagaaccac gctcaccgac 
aaagaacaca cggagagcrt gcagctgaag 
tgccgctgta ntaaacccga gcgcgcccac 
aaaacggacc atcactgccc gcgggcgaac 
ttcgcgcrct tcaccatgca catagctccg 
tttcagctca tctcctgtgc ccgagggcag 
ataacrgtaa tcctgttgat cctcccgcgc 
gcagttatgt ttggcaccca aatccactcc 
ttgaaaagtg agaagcccac atgggagcgg 
ttrggggggc cccccrcact ccrccggatg 
crgcccacga gacccagaaa aggcggcccg 
actgaaaccc gcccacagac ttccagtcac 
cacccgtgac caacagggca accggaaccc 
ctttatatat tcacagccac agacggcaga 
caacggaaag gcgcgcggcc acacgaagaa 
ggcctctgtg gagaatacct cgggccarta 
tgctgttccc aatcatgaag aaaaacagcg 
catttcaggg ggcccctgct gaccccgcca 
ggcgtttaca cagaaagacg cctcggcccc 
caaaagaccc gcgcaccgaa cagtgaaggt 
agagacatcc crrgaccctc tcagcaagtc 
cgtgtgtgca cgcgcgccaa aattgccagt 
grgtacagca aacaagctat ttttcagaaa 
gcggggccct gcccgtggtc accatgaatg 
gaacagccgt tccrgrgcgg cccctcgccg 
cgagcgcrtg agggccttgg aactgacttt 
tacaaarggc acccaggtaa gagcagagcc 
cccgacgctg cgtgtggggc aggggaggca 
gggcacccct gggaagggga ggggaccacg 
gccaccgcag gcagcgcccc agcccgggaa 
ttcggccagc crttacgttc tcrtctccac 
ggtgcggact ttaaaaccac gccagaaagc 
aaccrgcccg gtctgtacat cccctgccgg 
tgagggrctt cccccatgcc tgccccccac 
aattgrtttg ccrttctgtt cacccgcgaa 
cacccgaatc agggcttcta ccaccgccga 
accgtcctag ccaagcgagc aaacctgcag 
aagcgcgtgc gccgttgggg gaagagccgc 
tcccggagac gccagtttcc gagatcgctc 
ttggacacgc gtgtgggctt cagtgtgagg 
aacaatratc caagtggccg aaccccgtga 
cttgaccttt caaccccctc tctcaatgca 
cagtrctcaa aaaaaanaaa gggnggccgc 

<210> 2 
<211> 2561 
<212> DNA 

<213> Homo sapiens 
<220> 

<221> niisc_feature 

<223> Incyte ID No: 227709.3 



Ctgggcgagc agaggcatga gagragagag 120 
cccagatccg accacacagg acagcargca 160 
gtccaagtga tgttcagaaa caacr-gcga 240 
ggacacaggc cccgggacgr cgagcarcac 3 00 
cccccaticgt: cctcctcctc cgagccrgac 360 
ggcrgcggca tgacccgcgc tgccacgacg 420 
gcgacctrcg tcatgccgcc gccccccaaa 480 
atccccaacc gcctggccgr gcttgccctg 540 
cccggggcag tacccaaagg aaacgccacg 600 
cccggggaag tcatccacaa gcgccccaag 6 60 
caccgcagca tccgcaaaag acgtacccgg 720 
aattgcgcag gagaaaagaa tcaaagatcc 780 
tctccagtcc acgctctgac cctctgtgga 840 
tggaccgaat gcagtgattt ctcacccccg 900 
ctcgagggcc ctctgttttr caccttcact 960 
atatgcaacg acgagacgga gatcgagcga 1020 
aggctgcgat gggaagggat gaagcccgrc 1080 
aaccccctcg tgggctcccg acttaggcga 114 0 
gagcccccag tgtgaggcgc ggctcaccag 1200 
ctatccgggg tctgaaggac accaacagct 1260 
acacaaacca accgcctgca gcaagcagag 1320 
ggaagaggct ctcagtcccc accrgracaa 1380 
gccaaacgcc gcggcccccc gcagagctgg 1440 
catgggrcat ccaaatcccg ggrccngagc 1500 
aatccagtga acagggantc tccaagcagc 1560 
ctcagcagtg cactccccgg atcacagcag 1620 
gattagcccc gacgccctgc gctgaagttg 1680 
■ggcctccggc acactccccg ccgccccgga 1740 
tgtgtgrgtg cgcgtctgtg cgtgtgcgcg 1800 
gctgtrcagg caacgtaaca Ctcaccggct 1860 
ccgacgtttc agggaagagg ggagagagcc 192 0 
cactgccgct; ggaggacatc ccgatccaaa 1980 
ccctcctgct tccatntctt aaagaaaccc 2040 
tttCCCCttg tcccagccaa attagcagcg 2100 
gcggcccggc gacttgatac ttggggcagc 2160 
tcccnactgg agaggcaggg cccagccact 2220 
aggcagccag cccctggcag gggcgaccgt 2280 
tggccaggat ggcgccctct Cgttggagtt 2340 
ccacggcaca ggtgataaaa taggacccnr 2400 
caacagctcc cctcgtgggg ccttgcctta 2460 
acgcaccaag aagcaatccg cgacaaagtc 2520 
actaagagaa gttggcgtct: ccccccrggg 2580 
ctgcccrccg tttttaatca ctctgcaccc 2640 
tgcaaaacca caaagggacc Cacctgagcc 2700 
ggggtrtgga agcggacttg gccaccgcag 2760 
gccacagcca gagggacaaa gtgtgggtga 2 820 
tgcacaccca CCCgcacact gtcgtccggg 2880 
ccctcaacac gcataccccg ctaccaacaa 2940 
gacccggcaa gcgcgcgcaa accaagtaCa 3 000 
acccctacac gaaacaaagt aatcaatcaa 3060 
cgnccannga g 3101 



2/13 



wo 00/75298 



PCT,aJSO0/15344 



<220> 

<221> unsure 

<222> 126, 2144 

<223> a, t, c, g, or ocher 



<400> 2 

gcgggcgcgt cgcccz:czgc ccccgccggc accc^ggcca tgacaggcaa grcggrgaag 60 

gacgtgganc ggtaccaggc cgccctggcc aaccrgcrgc tggaggagga taacaagtrc 12 0 

cgtgcngacr gccagcccaa agggccgcga tgggccrctt ggaacattgg tgcgctcacc 18 0 

cgcactcgat gtigcrggaat ccacaggaac ctgggggcgc acacatccag ggcaaagcca 240 
gctaacctcg accagtggac ccaagaacag attcagcgca tgcaagagat gggaaacgga 3 00 
aaggcaaacc gacrrcatga agcctacctt cctgagacct ttcggcgacc tcagacagac 3 60 
ccagctgttg aaggacctac ccgagacaaa racgagaaga agaaatacat ggaccgaagt 420 

ccggacatca atgcctctag gaaagaaaaa gatgacaagc ggaaaagagg gagcgaacca 48 0 
gttccagaaa aaaaattigga acctgntigtir tttgagaagg cgaaaatgcc acagaaaaaa 540 

gaagacccac agccacctcg gaaaagctcc ccgaaatcca cagcgcctgt cacggatttg 600 

ntgggccttg acgctcccgc ggcctgctcc attgcaaaca gtaagaccag caacacccca 660 

gagaaggatt cagarctgct ggcccccgtL ccaccccctt cctcttcggg rtccagaaag 720 

gttgcaggtt ccacgccaac tgcagggagt gccggctctg ctcctgaaaa cctgaacccg 7 80 

tccccggagc cagggagcaa atcagaagaa ataggcaaga aacagctctc taaagactcc 84 0 

actctctcac tgtatggacc ccagacgccc caaacgccca ctcaagcaac gtccatggc:: 900 

cccgctcaga tggcacaccc cacagcccac cccagcctcc ccggggttac accccctaac 960 

agcacaatgg ggagcatgat: gcccccacca gcaggcatgg ttgctcagcc aggagcttct 1020 

gggacggccg cccccatggc catgccrgca ggccatatgg gcggcacgca ggcaccaang 1080 

acgggtgcgc cgaacggaat gatgaccacc cagcaggccg gctacatggc aggcarggca 1140 

gctacgcccc agactgrgca tggggtccag ccagcrcagc agctgcaatg gaaccttact 1200 

cagacgaccc agcagacggc tgggacgaac ttccatggag ccaatggcat gatgaactat 1260 

ggacagtcaa tgagcggcgg aaacggacag gcagcaaatc agactctcag cccccagatg 1320 

tggaaacaaa aacaaaacac ccgcacggct gccattcncc tcagccccgc gccctcccct 13 80 

rnccacagcc tccacccccg acccccat.cc •^ictttticcta cccctccgcr tggtttagaa 1440 

artgctcaat aagccatcrg gggcrnggca tcccgcccag ccacctccca aacatgaaga 1500 

cctctctgct gccrcatgrt. gracatgccc cacagccacc c,caacgtcct ccccagtcct 1560 

ctcctggcac cagcacctna gaagccgttg gcagaaggca cttaaactgt. gggagaagcg 1620 

tgcacacccn tgagtcccct cccccaaggc taaagcrccc gtcagactct cagaagggcc 1680 

tgcgggcgtc gratancagg caaacagggg aaagctitaga ggtcctccta tatgcgtcaa 1740 

taagctgttt ccaagcgccc aaaLCCgaaa agcatcatgt tctcatgatt tatgggaatg 18 00 

aagcaagtac cgaaaccaaa zcaaatactc cctgggtcct gggtcagctt gacccnagcc 1850 

ccggggtgag gcaagccccc tcccacgagg atgagcaaaa atactactct cttcgccccg 1920 

agtcgctttc t.ggatctggg gcttcaggac ccgccgcttc agtcagccrt tattagcacc 1980 

aaagacttta tgaagatccc acacacagac acacacccct: tcccgccccc cccctgcctt 2040 

cagtaggacc rggctccgcg gccggaggac caacccccat agcgggaatg cagagcttaa 2100 

cgtgcaccgc c-gtgcgcgc gcgcgaagcg tgcgtgcgtg taanaagtgt gcgttccgcc 2160 

tcccaccctc cccccatccg ctctgggtat ttttgrtttt gtttagttct aggcttacaa 2220 

cagagaggaa traacttatc agcagccraa aaccgttgtg trttnctcat: ggtttaaaaa 2280 

acgccatgcc accgataact ccctttctcc cttcccttcc cccggtctgc tgatcaccct 2340 

tccacgccrg tgranccagg g-gccctgct tccccaccgt tcccaggtgt acgaggcaga 2400 

gggccgggac agccrccctc ccagccaccg tccaccccac ttgaaaattc agacaagaaa 2460 

actttgctca aaagarttca cgtgtgggaa ccacagctcc nggccgccct tctcctgtgt 2520 

atgtgtaaat rccccaacaa acattgcagg gaaggacrgc t 2561 



<210> 3 

<211> 2710 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> inisc_f eature 

<223> Incyce ID No: 237703.2 



<220> 

<221> unsure 



3/13 



wo 00/75298 



PCT/USOO/15344 



<222> 712, 799, 2332, 2334, 2342, 
<223> a, t, c, g, or other 

<400> 3 

caggracc-c aecacattat tttatttaat 
tcT:rrrat:ag agtaagaaat tgagactcag 
ggaatigaaca tcaaaagccc attccctctg 
tcccccargc ctgcgrcagt aaactaacaa 
ccaacccaag agcacgccac atttaaccag 
ctaaaaggrc gcttccaagg ttgcaecgag 
ccrrantaat acrttattct cctgttaaaa 
taractggtig tgataatcct acagaaaaac 
cataaaacat taaacagaaa gcccacgtta 
ccttagtrgg accaggagca aatgtatggt 
caacaaagcg cagttaacac taacttgctc 
aaaagagatc ttaaattatt ttcatttaaa 
actcaaccaa atctaacacc acaagacaat 
gagtngaaag atttccttnt tttccrccca 
aaaggccact cacgtgaaca ccagggatga 
ccacagtgga cacrtagata tcgtccagga 
agtgacLgtg gacggctgga cgcccctgca 
ggctcctrrc ttactgcagc atgatgcaga 
ccccttgcar cttgctgccg ggaacagaga 
gaaccgttac gccaaaccag ggccgaaaaa 
caggaggaca agcarctatc accacctccc 
accccagtct caacaattcu agcaactttc 
tgtgagacgc atccccacaa tcaaagctga 
acactcatta naacattctt ccaagrgaac 
aagtaactcg cataraccct taattatttc 
aactttaatg tgcgcacacr taaaaacctg 
gcgctgacac aagagaaatg tatttttaaa 
ttagracatt gacaratatt rctacaaggt 
caaa-arcct gacacaactc agctgtcccc 
ttaaaccaga gcaagtaaca catcagcgac 
gcaaaaaacc tagatctttg tctcccagaa 
aagtgctraa ttcacgaata ctgcacactg 
ggatagccct accccaccct ggccaaccca 
tgaccttgta gacatgcaca caactatacc 
cccaactggr tttacccgcc taaccccacr 
gctcacagga aacgctgatt ttccaaggtc 
acaggggaaa aggaactagt ccaagagcaa 
acaaagtggc ccaggctnta aaaaaaaaac 
aaacatttga gaaaaataaa gcacaagctc 
tnacccgcgt atgtccctct cggatccaga 
tcaacaccca ctcactcagt ctccccagca 
cgtgartgcn cccaaagttt: tgtcttctca 
aaccagctgc tcctcacctc agttaaactt 
aagccatccc acgtaaccca gccaccccag 
taaatactta ttc::tgctta gaacrcatta 
ctgggcctgg 

<210> 4 
<211> 2059 
<212> DNA 

<213> Komo sapiens 
<220> 

<221> misc_feacure 

<223> Incyce ID No: 240091.1 

<220> 



2470, 2611, 2682 



cat-cargtat ccczazaagz aacatrcctic 60 
gaatartiaac ctgcccagga ccarcgaagr 120 
czzggcca.cz cccaccccat tttaccaagc ISO 
craaaagggr crcgcactct aaacagcrrt 240 
aggcccarag aacaaactga aaaccacaac 3 00 
aaggaatrga gcccccgaat ccctagaart 3 60 
grttcatrtt. taaaagtttc acacagtgrg 420 
caagcagc-ca cgctctcrtc acagataaca 480 
cccaccggac tgaagctcct acgcaaraaa 540 
tcgacacTica gagaacctca ttcttagaag 600 
acrcrcaaat cagtagtccr cccctcccca 660 
gtcacctact aacaagtaag tncttaccca 720 
tctgcttnag ttattgtttt ggttcgagtt 780 
gcrraccaca gcgcggagac cgcctcccga 840 
agargagcat acccctcticc accgagcagc 900 
gcccatcgca cagggggccg atgttcargc 95 0 
cagtgcctgr aagcggaata ancccagagr 1020 
taccaargcc caaacaaaag gcctctrgac 1080 
cagcaaggan accccagaac ccctcctgac 1140 
caacrtggaa gaaactgcat ttgatatcgc 1200 
cgaaatrgtg gaaggctgta caaactcctc 1260 
ccaagtrccc aaataccagt gcctcccgcg 1320 
cg-caaaca- cttactacaa aaattcagcg 1380 
tgcccgactr tgatgtcaaa argtactcga 1440 
tgtggagctr gcgatttttt tatcagaaat 1500 
acacgggccg cacagaaacc ggcactttcg 1560 
tarcccacar cccggatcrt tgtrgggtat 1620 
gaggtaactc agaacttaat tcaaaagtct 1680 
Ticcaccttac cacagccagc cgctittcatt 1740 
tngaarctrc ataagttaaa gcaaaaaaca 1800 
cacagaccac crtcaggaaa gcagttagct 1860 
catcccctac cacaatttac acaarccrgt 1920 
cacgatcccc aagctaacgg cgaancacga 1980 
crcgcccaac agatcataat acatctgcta 2040 
gactrgggca ctgcttgcat agtctcccaa 2100 
crcatccnta cagagcatac aggcaaagtg 2160 
ggggacgact atra-arcga ggctaaaacc 2220 
actrgcggana acgacaaaaa gcataagcaa 2280 
tigaacaacac aaaaggcacg antncacttt 2340 
acattattca tccagcacgc actcagctat 2400 
gcaactttcg cattgcctac ctagcccctt 2460 
acaccacaac actccagggg aagggaacta 2520 
ccaagatgtc caccaaggct tatccccttc 2580 
nccaagcaac aatgttattc aaccaaaggt 2 64 0 
gaccacccca gnaaaagtca gaggcaacac 2700 
2710 



4/13 



wo 00/75298 



PCT/USOO/15344 



<221> unsure 
<222> 1S50 

<223> a, t, c, g, or ocher 



<400> 4 

cgcgccccgg accrggcagg cggcggctgc agggcaggtc caggggccac atggctgagg 60 
gggacgcagg gagcgaccag aggcagaatg aggaaattga agcaacggca gccactrarg 12 0 
gcgaggagtg gtgtgtcact gatgactgcg ccaaaatanr tcgcatcaga actagcgacg ISO 
anacagatga ccccaaacgg acaccrrgct: tgcaggrgac gctgccgaat gaacacccag 24 0 
g-cacagctcc acctiatctac cagccgaacg ccccccggct caaagggcaa gaacgtgcgg 300 
atttatcaaa tagcctcgag gaaacatata tccagaatat cggtgaaagc actctttacc 3 60 
cgtgggcgga gaaaacaaga gatgrnctta cacaaaaatc tcagatgaca gaaccaggcc 42 0 
cagargtaaa gaagaaaact gaagaggaag atgccgaacg tgaagatgac crcatrttag 480 
cacgtcagcc ggaaagctcg gtcaaagcat cggactctga tatcagcgaa actcggacag 54 0 
aagcagaagt agaagaatca cctccgaccg atcacggcac ccctiacnaca gaccgaagaa 600 
gnacctttca ggcacacctg gctccagcgg cctgccccaa acaggcgaaa atggcncttt 650 
ccaaactgta tgagaanaag aaaatagcca gcgccaccca caacatctat gcctacagaa 720 
cacatcgcga ggacaaacag accttcctac aggatngtga ggacgacggg gaaacagcag 7 80 
ccggtgggcg tcttcttcat cncatggaga rtttgaatgt gaagaacgtc anggcggtag 84 0 
ta::cacgct:g gnacggaggg attctgctag gaccagatcg ctcraaacat accaacaact 900 
gcgccagaaa cataccagcg gaaaagaacr acacaaattc acctgaggag tcatccaagg 960 
ctctgggaaa gaacaaaaaa gcaagaaaag acaagaagag gaacgaacar caaraccnga 102 0 
aaccacagga aaggtcaatt cgcccacaat tatatacaca ctccacagtc atcaaggaat 1080 
a-attgtgca gagagagcac cctrgaccgc tcaagccagc cagctcagca cggataccaa 1140 
cattagcctt ccttctcggt tacatcatct gccaaaaaca gagaacttat gacctatcca 12 00 
tgcgtgrtrc aggcctactt gggagaacta acctgaactr aatcaccacc tcatccaacc 1260 
tcagcaaggr aacagttgcc cagggcagca cctgaaczaa ccgrccactt cagcacacgt 1320 
caagrgcctc cgtcaggcgg agaagaaacg tctctagagg aacataaana cctgacctcc 13 80 
tgtcatcgag attcctgtac cgc::aaacga acacrgcctc ttaccgccct cracggctta 1440 
crggaatagg agctcactca agatrgaccc tggagagntt crtcttgtga ttttagctca 1500 
Caagtatgtc acctttcact Ctatagcgtc caccattgag taarggacca agcgaaaatc 1560 
caggagtacc catctgcagt tatgtgccga ggcgacaact cacccaacan artcgccagc 1620 
acaaatatta cgcttcagtr cccgctgcaa atcggtgact gtgaaattac agaaagtgac 1680 
rccctagcct: gcctcccrtg trcaactctc gtaacgtaag caacaaatat ggagrgtcag 1740 
cagt-ctccrc ccaccccaga aacgcgctgg tgcaacattc tcgrttcctt taacaacctg 1800 
ggaagcacct CCCttgtgac ctccaccgag gaatcagaac tacgacagan grtaggctgt 1860 
ggcaaacggg acattcgr.ag agcgggacag aggcggcaga acgaacctgg cgtagggcag 192 0 
gagtacgttg tgtagtaca:: caactcgatg cacgcttrcc acccgcaccc cagacggctt 1980 
tctcagcccc aagactttgc agagagaagg agcaaacctr ctcarcggaa aaacagaaac 2040 
aacccccccc cccacrctt 2059 

<210> 5 

<211> 3705 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_f eature 

<223> Incyte ID No: 243096.6 

<220> 

<221> unsure 

<222> 13-14, 2121 

<223> a, t, c, g. or other 



<400> 5 

cgcagcgccg gannccgcag cgccggaacc 
gtcagctgcg ggagcgcttc cggggacggt 
tgcagcgccg cccaggccgc ctggcgggag 
cgctggttcc cgggccacat ggccaagggg 
gcggactgta tcaccgaggt ccacgatgcc 



ncagaggcgg gtcgcagcgg cgcagaggag 6 0 
gccgccat-ga gattgacccc gcgcgcgctg 12 0 
aactcccccc Cgcgcggtcg cgacgtggcg 180 
ctgaagaaga tgcagagcag cccgaagccg 240 
cggatcccac tctcaggccg caaccccctg 3 00 



5/13 



wo 00/75298 



PCT/USOO/15344 



tt-caggaaa cccrrgggct tiaagccrcac rtgcrggtcc rcaacaagac ggacrtggcg 360 
gacctracag agcagcagaa aatcatgcaa cacrragaag gagaaggccr aaaaaargrc 420 
atttcracca accgtgraaa ggacgaaaat grcaagcaga ccatcccgac ggtcacrgaa 480 
ctgacwoaga gaagccaccg ccaccaccga aaagagaacc cggagcaccg T:at:cacggT:c 54 0 
attggggtcc ccaacgcggg caagtccccc crcaccaacc ccccccggag gcagcacctc 600 
aggaaaggga aagccaccag ggtgggcggc gagccnggga tcaccagagc tgtgatgrcc 660 
aaaartcacg tctcrgagcg gcccctgacg ccccrgtcgg acacrcctgg cgngccggcr 720 
cctcggartg aaagtgcgga gacaggcctg aagccggccc tgtgrggaac ggtgccggac 780 
caccragtcg gggaggagac catggccgac cacctgccgt acaccctcaa caaacaccag 84 0 
cgcctcgggc acgtgcagca ccacggcccg ggcagngccr gcgacaacgt agagcgcgrg 900 
ccgaagagtg cggctgcgaa gctggggaag acgcagaagg tgaaggcgct cacgggcacg 96 0 
ggtaacgcga acgttattca gcctaactac cctgcggcag cccgcgactt cccgcagact 1020 
ttccgccgcg ggctgctggg ttccgcgatg ctggaccccg acgtcctgcg gggccacccc 108 0 
ccggctgaga ccttgccccg aacttgtccg ggtagggagg gccggaggca tgtggcctcc 1140 
cagaccrcct gacctgggcg gtrgaggctc aagacagctc acccggrcca gaagccccat 1200 
gctggtcact: agggcgctgt gccctcrggc gccccacagc ctggccagct: ccagggaccc 1260 
cag=cgcagg gcccaagcag gtgggagtgg acaccaggct tcccagtgga cgtccctgag 1320 
cagctccgca tgcttggccc tcccggagcc ccctgctcag gcctcrtgag aaarggacgc 13 8 0 
tgtcccagaa ggagttaaag ctataacccg taacctttaa aatc::ccagc taaagggcct 144 0 
gtctccracc ggccrgcgag gtgcaccgta gtgccttggg cctgtgtgtt aaagctgctc 1500 
tcaccacrcg aacctaagaa acgagcaggt cggcagcnag ggtttgcgrc ggaggccttc 1560 
ggtccagcgt: ctcgcagccc tacaacaagt gagaggcrng crgccatcag agaggccrat 1620 
ttcacactta caggcacaca cagacacaga ccagagaccc ccagcagcag agcccaagca 1680 
ctggcttcgc cccccagtgc cctggggcat gttcagggca gggccgaggg ggacgccctg 1740 
cacatggcrr cgctigtgcaa cgaccggaag gccgcccggc acgggcagta gagacccccg 1800 
gcccctgagc acctcctagc tcactggtag tgggattctg cattagtggg gcrgagagat 1860 
gtgggggccc ccccagcccc antatagcgc acctgaaggg gcccacagcc tgcgtcctag 1920 
aagagggaag aagaaggaag gcgggcgggg ccggcagcat ggacraaggt gctgcaggac 19 8 0 
ctggggccag ggacaccccg tgcagaagcr ccggctgcct crctgcggrg gcggcctgac 2040 
cgtcccacag cagcctccac cagggccccg gtgctcagcg gcccctcttc gccggctggc 2100 
tgcccccgct gccccatacc ncacacactc atcagcccga agccagcccc tgagtgccac 2160 
ctgcatcgcg ccataacccr. gaccgcctgg ggcaggaagr atccaggctg gctgtgticag 2220 
atgccaatgt gctgaatcaa cagtcattgc agatcacgaa gtgcccatca taaccggaac 22 80 
actccaccag critgcagtgc tgtggcgtgt gagggcctgg tgcagcccag cccaccctcc 2340 
aggcgggcat cngcaaagtt gagggggctc cggtgggtcc ccccgctgng aggagaccca 2400 
gaccaccccc cgcctcctigg gggaaargrc agaagggctt ccctgcccac gaggatccgg 2460 
ggcagggctt ggccctggcc tgctggccct: ggaggcgrtg agcccggtcc ggaaggggcg 2520 
gaggagcgtc tgggcccact gggccagggg cartgccggc agtgcggacg ggaggccgca 2580 
gggcgccgcc ccccgcggct cagcgccctg gagccagaga gcagtgctcg gctgagcccc 2640 
gccaacagcc nccagaccct cacccaggcc agaacccagg ccagcrgggg aaggcagagg 2700 
ctagcagggc ccgtggcggg tgctggcctt gactrrggrg tccaccgagt cccgaggccc 2 7 60 
aggcccagga gggacgcagt ccggctgagg gcgaggcngc caccaggaca tggagagggc 2S2 0 
gagatcccaa ggccacgggg gggggggcag ggagaacccc tcctacccrg gatgagcggg 2 880 
tgaccggaga gctagagaac gcggcagacc caagacctcc cagtgctgag cccacggagg 2940 
atgccccagg ctggcgggac tgggaagcag agggcrggcc traacacagg tgcgtccagt 3 000 
gctggaggca agtccttgcc gcgactgccc agcgccactc catgtctctc ccgcccttgg 3060 
acgttggggg gctcagcctc ctgcacgggt gccccgctgg gcgctgggcc ccgccactgg 3120 
cccccrgccc gcrttggggt ccgagcragc tcccggctcc actgagcagg ccgtcagccg 3180 
ccagcccacc acgcggatac ccaggccctg ttccgaggcc tggaacagct gctcccgaag 3240 
aaggggctgc cttcagggaa acgcgtgcac cgtgcagcct grgccgtgcc cagggaggcc 3300 
tcttcagcgg gantggcagr tgctgrgccc tgagaacagg cagaactgtg tgatccctga 3360 
atgtgaacct gaagctcaaa ggacctggaa agcrccggaa cgtgtcggct ttticcccccc 3420 
aaaatgggcc ctaaggaggg taaagcgacc tgcttcaagt tgttggagca aagtgggcct 3 4 80 
ctcacggatc tcggcctgag ggcgtggggg agaaggcccg gacagcccct; cagggcaggg 3 540 
tgcgtccccc caccagccgc agagagccag gacggacgct cctcggacgg acggtttccc 3 600 
tgcttgggaa tgctcccggg ctgtgagacc cactctcctg ggcaggtggt tagcacctaa 3660 
cgttcttccc tcacnccccc ccaaactcnr. aagtcctctg gtcca 3705 

<210> 6 
<211> 3644 
<212> DNA 



5/13 



wo 00/75298 



PCT/USOO/15344 



<213> Homo sapiens 
<220> 

<221> misc_f eacure 

<223> Incyce ID No: 244366.6 

<400> 5 

gcrttccacc tgcaaccatic tgcatgrgca cagcccaccg tttgtctcca gtttttaaac oO^ 
tgtacaagtr gcgctrcrtia accttcccrc ccgcct:rgrc ccggggaggt ggctarrcar 120 
cattcggaat cacctctccc ccccccacgt: gctcrccttc acctgagacc ccttgaccct. 180 
cggcctcact tgggaggggg aagggcgaca aagttttccg ctrcccrggt crcccttcg;: 240 
actccccccc gttgcccccc ccctcccact ttctcgcctg ctctgccgct: gcgcgggccr 3 00 
gggctacgcg gcagggcaga tttcccatca gagcrccaac atgcccgcag agnccggaaa 3 60 
gagacccaaa cccagcaagt atgtcccggt ccctgcagcc gccatcctcc tagcgggagc 420 
tacgacaccc ttctttgcct ttacgngtcc aggaccaagc ctgcatgcgc cacccgcag- 480 
gcccatccac aacgcaacca tgrtcccccc cgrgctggcc aacttcagca tggccaccrc 540 
catggaccca gggactttcc ctcgagccga ggaggatgag gacaaggaag atgatt-ccg 600 
agctccccnt tacaaaacag tggagacaaa gggcacccag gtgcgcatga aatggcgtgc 66 C 
cacctgccgc ttctaccgtc cccctcgacg ttcccactgc agtgtctgcg acaaccgtgr 72 0 
ggaggaatcc ga-ricatcact gccccrgggc gaacaactgc atcggtcgcc ggaaccaccg 780 
rtattttttc cttttcctcc cttccccgac agcccacatt acgggcgcgt tcggcnccgg 840 
cctccccna:: gtcccccacc acatagagga actcccaggg gcccgcacgg cngrcacaar 900 
cgcagcaarg tgcgtggccg gcccaccctt catccctgca gctggcctca cgggacttca 960 
cgtggtcccg gtggccaggg gacgcacaac caatgaacag gttacgggta aarrccgggg 1020 
aggtgcgaac ccc-rcacca atggctgccg taacaatgcc agccgtgttc tccgcagttc 1080 
rccagcaccc aggcatttgg ggagaccaaa gaaagagaag acaartgcaa tcagaccccc 1140 
crccccccga ccagaagCCt cagacgggca gataactgng aagancacgg ataacggcar 12 00 
ccagggagag ctgaggagaa caaagrctaa gggaagcccg gagataacag agagccagcc 1260 
tgcagacgcc gaacccccac cccctccraa gccagaccrg agccgrraca cagggt;tgcg 1320 
aacacaccrc ggcccggcta ccaargagga tagtagctta tcggccaagg acagcccccc 13 80 
gacaccnacc atgcacaagn accggccggg ccacagtagc agcagtaccg tcagccgcca 1440 
tgccgcaccc ctccagcgcc aagctgagtc gt.ggggacag ccrgaaggag ccaaccccaa 1500 
ctgcagagag cagccgtcac cccagctacc gcccagagcc cagcctggaa ccagagagcr 1560 
cccgctcccc taccrttggc aaaagctctc acttcgaccc actacccagt ggcccacgcc 1620 
cctccagcct caagccagcc cagggcacag gccttgagct gggccagttg caatccacnc 1680 
gcrcagaggg caccacctcc accnccrara agagcct.ggc caaccagaca cgcaatggaa 1740 
gcctatctta tgacagcttg cccacaccct cagacagccc tgattccgag tcagtgcagg 1800 
cagggcccga gccagaccca ccttnaggct acacctctcc ctccccgcca gccaggctgg 1860 
cccagcaacg ggaagctgag aggcacccac gtccggrgcc aaccggccca acacaccgag 1920 
agccc::cacc agcccgttac gacaacccgt cgcgccacat tgtggcctcc ccccaggaac 1980 
gagagaagcc gccgcgccag tcacccccac tcccgggccg cgaggaagaa ccaggcccgg 2040 
gggacrcagg cactcagtcg acaccaggct cgggccacgc cccccgtacr agctcctcct 2100 
cagatgatrc aaagagatca ccttcgggca agactccact gggacgccca gctgcccccc 2160 
gccttggcaa gccagatggg ctaaggggcc ggggagtagg gtcccctgaa ctcaggccca 2220 
acagccccat acctgggccg accgacgcct tacagcagcc ...aaaagccca acccggtgcc 2280 
tctgagacag aagaagcggc crtgcagcca ctaccgacac ccaaagacga agtacagccg 2340 
aagaccacct acagcaaacc caacgggcag cccaagagcr taggctcagc crcccccggc 2 4 00 
ccaggccagc cacctctcag tagccccacg aggggaggag tcaagaaggt gtcaggggct 2460 
ggtggnacca cccatgagac ctcggcgtga gccttcggca ccr:cccczcc ccaacgccrc 2520 
tgcgcccaca ccaaagggcc ccaggtggcc acczzccx^vc cctcaagggg cncccccccc 2580 
gtgcacggac atrccttaaa ccaccgacrc caagaggatg aggagcgttt tccaaaatgc 2640 
agcaggccrg gggagtcgga gagtcggggc cccgagaccg gggtagcaac ccccccttct 2700 
accttttaag acctcccctt cctcgacccc cggaccagac tcagcggaca tttgtgcaat 2760 
tgcccgccct ggagggaacc agatcacttt taaaccagaa ataatcrttt ttattattgt 2820 
tacggattcc attirttttcc tctcccgcgt taccaggcgt gtgtgcacat ataatatata 2880 
tatatacaca ttataaacac caaagaaact aeacacctac cctgggacgg gaaaacgagg 2940 
gagggataca tatacggagg gggacctcac cctccccatt: ccccagacca gcaggaaaag 3 000 
aggggagacg tcagcc-ctc tcctgtggct ccctcrcatc tgncccagtt actaaccacg 3 060 
ggaaaragca ccctccgctg gtgccaagtg tgattaggaa gaagcctggg gagaggcgag 3120 
tccggaatct tgg:;cacaag agggaaggac tcggagagga gaatcagttt cccaggctca 3180 
ttggcactta gtctccctag gaaaggggcc aaaactrcaa gacaccggtg gtggcgggag 3 24 0 
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wo 00/75298 



PCT/US00/IS344 



g&v.x.v.x-s.a.ai 
caaaaacggg ggatagagag 
ggctaccccc cagcccaccz. 



cccaaggcca ggagagaaga acccaaaaag caacar;ct;rt: carcacargc 3 42 0 



aaggagcggc aggcccaggc ccc-ccgacc gccccrrggg 3460 
cactacggt^g ct.gggt:agag gggataccrg ggrtctaacc 3540 
ra tcccgarrag 3600 



rcraaaragg ggagarccca gccrccacaa agaggcccct rractt 
ccactccaaa ccaacgagga acaaaaagaa atcccgatct aaaa 



<210> 7 

<211> 3117 

<212> DNA 

<213> Komo sapiens 

<220> 

<221> iiiisc_f eature 

<223> Incyre ID No: 405313.4 

<220> 

<221> unsure 

<222> 64, 521, 534, 547 

<223> a, z, c, g, or other 



<400> 7 

gtccgcccgc ggccccggcg gcgccaggrg 
cgcngccgcc gcagctgccc caggctcccc 
gtcgggagaa ggcccagaag ccgaacgagc 
gggaggagga caacaagtac tgcgccgacc 
ggaaca::tgg cgtgcttiatc cgcaccagac 
acatatccag ggccaaatca gtcaacccag 
tgcaagacat gggaaatact aaagcaagac 
ttcgaagacc acagacagat caagcagcgg 
agaaat-acta cgacaaaaar gccatagcta 
aagaganaaa gagagaaaag gagccagaaa 
tgcagaagaa agatcagcaa ccggagccta 
agcccactrgr ggatctttta ggacttgacg 
acacaacggr gccacccccg aacgatgacc 
ccttacccgc aactgccatg cccccagctc 
cccrgcccac agtaacatct ggggatctag 
aagaagtggc aaagaaacaa crtcccaaag 
ccarccaaca gcaaagcact cccggcgcat 
cacaagcacc agctgcattt cagggctttc 
crggcctrat. aggaaacgtg acgggacaga 
ccccaatggg tt-atgggaa acgcacaaac 
tggcccccaa cgaggaacgg CgggacaaaC 
gcaagctcag cagccccagt ggagcctctc 
tatcagcagt gcaaccccta ccgcaggtct 
gtctggaagc tcatcaggtc agactctcag 
caagtctcat ccagaactac cacctgacat 
ttattcatat gcatactctt tttctttrca 
tgaccgtgtt ggtccgtact gatncaactr 
tttatgtcaa gggcagcttt gctcarattt 
aagccgccca acctgcaaaa tcagctcrcc 
acataaggga agtagccatc angttagraa 
actagccagt. aarcctgtag gaaggcactg 
acgtaaangt ctcaccgagc actgttr.cct 
cacttcactg cgcrgttgtt atgacgcgct 
taaacgtgga ngtcactcca aaacttcgtt 
ccgcctccct. tgtaatttgg acccctcctt 
ttccgcacca catgcaaaca gggcaactaa 
cttgaaggca tatccaggtc tatgacagta 
tgacaaaacg ctatccccct acaccaaaca 



cgr'ccacc.ct; gcccggcccc agccagcgcc 60 

gccccgccgc cgagatggcg acgcgctcct 120 

agcaccagcr caccctaccc aagcctctga 180 

gcgaggccaa aggncctcga tgggctcccn 240 

gtgccggaat tcaragaaat cttggggtcc 300 

accaacggac agcagaacag acacagcgca 360 

cacrctatga agccaacctt: ccagagaact 420 

aattttccat. cagagataaa tacgaaaaga 480 

ttacaaaraa ngaaaaggaa aaanaaaagg 54 0 

agccggcaaa accacctaca gctgaaaagc 500 

aaaaaagcac cagccccaaa aaagctgtgg 560 

gccccgccgc ggcaccagtg accaacggga 72 0 

cggacacctt tggaccgacg attrctaatc 780 

agsggacacc ctccgcacca gcagctgcaa 84 0 

atttantcac tgagcaaact acaaaatcag 900 

actccacctc atctccgcat ggcacaggaa 960 

rtatgggacc cacaaatata ccacccaccc 1020 

catcgatggg cgtgcccgtg cccgcagccc 1080 

gtccaagcan gacggcgggg cacgcccacg 1140 

cggtgcgatg ccaccccctc agaacgrtgt 1200 

gggtgcaccc cagagtaagt ccggcctgcc 1260 

acagatgaac cagcagar.gg ccggcatgag 13 20 

tggccagccc tccagcacaa cagcaggatg 13 80 

cacacaaccg tggaaatgaa aaccgcaaca 1440 

tccc:igctga aacgcatcca gttcccctgt 1500 

cccatctgct cacattaaga acgatctgat 1560 

gacgcggcga aaagcaggtt gacaaatcat 1620 

cccatgatt:: catgcactgc accatttgag 1680 

tctcaataaa artacagccc taacgtttgc 1740 

cacct^ccaat agtiataaacc ccaccccaaa 1800 

tacgatcaaa tgtttaacca tataaataga 1860 

agtg-accaa aangctccca ctccatcatt 1920 

taacagggaa cgcgattagt gaaaggaaga 1980 

taatgaatgc ttaaagaact caaattttac 2040 

aacgtacaca gcgccaacat gaagaccttc 2100 

ccaaaacaaa gccactttca atcctcaatc 2160 

attgtgrcta cattctacgg tgcctagtat 2220 

tgac'ccata gacccttcca tacgtgggct 2280 
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crratctcct atgargcana ctgccac::aa ccrcccaaaa accaccragt anrgcaaagt: 2340 

caggaatcac caggaacgcr tagcrgacaa aaracrrgrc ngrrrraaaa accrgctcaa 24C0 

gcctaccaac crgtrcaagr ct.accaat;ta taagggcaaa rrggagaaaa agaaaaaara 2460 

cacacrcaag agcggcacct tgcagratcg gcaccg:;aca aaaaaarcrc ccaarrragc 2 52 0 

cgtrgtagag aaaacargca gaacaaacga agacaaaaca cacattrtgc accaaccacc 25S0 

caarragcrt: argctaaccg acaagcccca tccaaacaga tg-ccarcag acgacaagaa 2640 

aggccgctgt actgaagtaa aacaaacaat acccgaargc tctgcagccc aaactccaaa 2700 

catcccctcc catatggarc cactggcrgg acaaactgca ccagrcgccg crccaacrca 2760 

taccccaatr ctcacrgtgt ccaggcggna ccrcggctcg rtggcragac raacctrctc 2820 

tgcccgagtg cgccacacga gaaccrgaag gggaaggaaa tagcttgggc agcgcacrct 2880 

ccatggtgac actcgaggtc gggcagcaca agcgcaatga atacctragt gcagctartt 2940 

gcrtccggtt ccagctcccc gactgz-gtt acctgctcga gaaagccaga ctcrtgcacc 3 000 

cccggctggg atccacgacg ctcaaacaca gccrctggat cggacaaaat. gactcgaaga 3 060 

cttacagcaa accccccgcg aaaaacaaaa aaaaaaaaag agactccaaa aaaaaaa 3117 

<210> 8 

<211> 2235 

<212> DNA 

<213> Homo sapiens 

<220> 

<221> misc_f eacure 

<223> Incyte ID No: 436857.2 

<220> 

<221> unsure 
<222> 289-319 

<223> a, t, c, g, or other 

<400> 8 

ctcatcccgt arctgcgcgt atgagacgca ttgtctcttc ctcrgcagtt. gagctgaang 60 
aatacctccg aagccgctct gccccccaga cgcgaacagc tccaccatac cagcctcgcc 120 
tt-ccttccgg gggacaacgt gggccagggc acagagagat a.tttiaacgtc acccrcctgg 18 0 
ggcttrcarg ggactccctc tgccacattt tccggaggcr gggaaagrcg ccagaggcct 240 
cagaactcca gcctaacgga rcccaaactc gggagaacgg ctgcg^iccnn rj-Lnnnnnnrm 3 00 
nnnnnnnnnn nnnnnnnnng cggcacgccc tcci:caccct ccccgccccc ggcgccgtta 3 60 
gagaaagtct tccagtacat tgaccrccat caggacgaac ttgcgcagac gctgaaggag 420 
tgggcggcca ccgagagcga ccccgcccag cccgcgcccc gctccagaca agagcrcttc 480 
agaargacgg ccgtggctgc ggacacgcrg cagcgcctgg gggcccgtgt ggcctcggcg 540 
gacacgggcc crcagcagct gcccganggt cagagccctc caatacctcc cgccarcctg 600 
gccgaaccgg ggagcgarcc cacgaaaggc accgrgtgct tccacggcca ctcggacgtg 660 
cagcccgctg accggggcga cgggcggccc acggacccct: atgcgccgac ggaggcagac 720 
gggaaacttr atggacgagg agcgaccgac aacaaaggcc ccgccttggc ctggaccaac 780 
gctgcgagcg cccccagagc cctggagcaa gaccttccrg tgaataccaa atncaccact 840 
gaggggatgg aagaggccgg ctccgc-gcc ccggaggaac ttgtggaaaa agaaaaggac 900 
cgatccttct ccggtgtgga ccacactgta actccagaca acccgcggat cagccaaagg 960 
aagccagcaa tcacttacgg aacccggggg aacagcnact ccarggcgga ggcgaaacgc 1020 
agagaccagg attttcactc aggaaccrct ggcggcatcc ctcacgaacc aatggcrgat 1080 
ctggttgccc tcctcggcag cctggcagac -cgcccggcc atatcccggt ccctggaacc 1140 
tatgatgaag tggttcctct tacagaagag gaaataaaca cacacaaagc catccatcta 1200 
gacccagaag aacaccggaa cagcagccgg gttgagaaat ttctgctcga taccaaggag 1260 
gagactccaa cgcacctctg gaggcaccca tcrctrtcca tccacgggat cgagggcgcg 1320 
ttcgatgagc ctggaactaa aacag^cata cctggccgag ccataggaaa atcttcaatc 13 80 
cgcctagtcc crcacatgaa tgcgtctgcg gtggaaaaac aggtgacacg acaccrcgaa 1440 
gatgtgttct ccaaaagaaa tagctccaac aagatggttg tttccatgac tccaggacna 1500 
cacccgcgga tcgcaaatat cgatgacacc cagtatcccg cagcaaaaag agcgaccaga 1560 
acagtgctrg gaacagaacc agatacgacc cgggacggat ccaccattcc aattgccaaa 1620 
acgntccagg agatcgtcca caagagcgtg gtgctaactc cgctgggagc cgttgatgac 1680 
ggagaacatt cgcagaacga gaaaaccaac aggrggaact acaragaggg aaccaaacta 1740 
tttgcngcct ttcrcrcaga gacggcccag ctccatcaar cacaagaacc trccagcctg 1800 
atccgatcca ctgacagart cacctccccc acatccccag acagggatgg aatgtaaata 1860 
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cccagagaar rrgggtctag tatagtacar tttccccccc arrcaaaarg r.ct:t:gcgara 192C 

rccggarcag r^aataaaata rrccaaaggc acaga::gcrg gaaacggrtr aaggcccccc 198 D 

accgcacacc ncccccaagr cacagccgci: cgcagcaac- cgarcrcccc aagccccgcg 204C 

caacagcccc aggactggac cccctccaac ctcrcagcar: atctccaacc tcgcaarrrg 2100 

actggcaTiaa tcaccccggt: ctgccttcua ggtccccaag cgctcgcgac acauaaccat: 2150 

rccacccaac garcgccctt: gccctaccac ccctrccctit catcttacta ataaaaargc 222 0 

tggrctccac cacng 223s 



<210> 9 

<211> 542 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> misc_f eature 

<223> Incyce ID No: 247285. l.j 



<400> 9 

cggggactga gggaagaagr 
ctcgccggct ccrgagcggg 
tcatttggac cgagagccgg 
gagaggtgtt tgagcccaga 
tggctccgga agaagattat: 
ttgcccggga gatcggcatt 
agggcaccgc ggccccactg 
ttcaccactt caacttcgcc 
atcggagccc ggtgacccaa 
ag 



gaaaaccgga ctgccaggcg 
ccaccgggcc cgggcrgggg 
agaacgagaa caggacctga 
rgagccacgg crggacgacc 
gatgagaccc acattcctag 
gaccccacca aggaaccaga 
cccggagagt ggaaaccatg 
aacgggcagc ccatgcggga 
gagcgggcaa agccgccaac 



acagtccctc cgttcgaaat 60 
gcctggcggg agaaacaact: 12 0 
gagtacaccg ggccaaggag 180 
cctccgcata ggagatcagc 240 
tgagcaagaa attcttgaar 300 
actgacgrgg ccggcgcgag 360 
ccaggacatc acaggtgaca 42 0 
ccarccargt gacgaacact 480 
ttctggggcc atcaagaaga 54 0 
542 



<210> 10 

<211> 358 
<212> DNA 
<213> Homo sapiens 



<220> 

<221> misc_feacure 

<223> Incyce ID No: 254510.1. j 



<400> 10 

cggacggcgc ggagtgactg 
ctgaggaaga aacccggaag 
agggrcnact gacarccagg 
tggaccccgc tcagaggact 
tcnccccgga taccccctcc 
atacagaagt ggcccacaca 



tcccaccgcc gcgggattga 
aggaagagga gagcaaagga 
gatgtggcca tagaanrctic 
ctatacagag acgLgacgct 
aaacgcacga cgaagacgtt 
gggacaccgc aaatacatgc 



cccctiaaaga ccccnggtac 60 

gccagggatg gctt:ct:cccc 120 

tcaggaggag cggaaacgcc 180 

ggagaattac aggaacccgg 240 

ctcaccaaca ggacaaggca 3 00 

aagtcaccac actggaga 3 58 



<210> 11 

<211> 1481 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> misc_feacure 

<223> Incyte ID No: 284125.2. j 



<40Q> 11 

gtgttgcgcg accggccttg agggagagct 
tcgatcgaaa tcgaatcccc ggacgtgacc 
agtccacatc gggcgtcagc caccrtgcag 
gacagcattg agagtcttgc ggctgacart 
gctatacagc ctccgaaact gccagacaaa 



ggggcctgcr cccggagaga tacggctacg 6 0 

cgcccuacca ngcagcacrc gaaggagaac 120 

gaggagacca ctgcgtccct gaacactgtg 180 

aacagcggcc attgggatac cgcgttgcag 240 

accct-cattg acctctatga acaggtcgtc 300 
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ctggaarrga cagagctccg tgaarcgggt gccgccaggt: cacrrttgag acagaccga- 3 60 

cccacgacca cgctaaaaca aacacagcca gagcgacaca tccacctgaa gaaccctrrg 42C 

gccaggccrr accticgatcc rcgtgaggca tacccagacg gaagragcaa agaaaagaga 480 

agagcagcaa tcgcccaggc cttagcrggc gaagrcagcg tggtgcctcc arcrcgccrc 540 

atggcattgc tgggacaggc actgaagcgg cagcagcacc agggactgct tccccccggr 500 

atgaccarag atccgttccg aggcaaggca gcngccaaag acgcggaaga agaaaagrrc 660 

cctacacaac cgagcaggca tattaagrtt ggccagaaat cacacgcgga gcgcgcrcga 720 

ctctccccag acggccagta rttggtcact gggtctgtitg atggatrcat tgaagcacgg 7SC 

aactttacta ctggaaaaac cagaaaggac cttaagcacc aggcccaaga taactirratg 840 

atgatggatg acgccgtcct ctgcatgcgt; tccagcagag atacagaaat gctagcaacc 900 

ggggcccaag acggaaaaat caaggcgcgg aagantzcaga gnggacaatg ttcaaggaga 960 

tttgagaggg cacacagraa gggtgrcacc tgtctaagct tctctaagga tagcagtcag 1020 

acccttagtg ctrcttttga ccagacaatt agaattcatg gcctaaaarc cgggaaaacc 1080 

ctgaaggaat tccgtggcca ttcctccctt: gctaacgaag caacatttac acaagatgga 1140 

cattacatta ttagcgcatc ctctgatggc accgcaaaga tccggaatac gaagaccaca 1200 

gaatgttcaa atacccttaa accccrgggc agcaccgcag ggaccagata ccaccgtcaa 12 6 0 

cagtgtgatt cracccccca aaaaccctga gcacctcgtg grgtgcaaca gatcaaacac 1320 

ggtggtcarc atgaacatgc aggggcagar. Cgccagaagc nccagcnctg gcaaaagaga 13 80 

aggtggggac ctcgtttgcc gtgcccr.ctc rccccgtggt gaacggatct accgcgtagg 1440 

ggaggacrrt gtgctctact gttrcagnac agccactggc a 1481 



<210> 12 
<211> 2439 
<212> DNA 

<213> Homo sapiens 



<220> 

<221> misc_feature 

<223> Incyte ID No: 331554. 4. j 



<223> a, Z. c, g, or other 
<400> 12 

ccggatntta gcgccagang cgcccccagc cgggcgggcg ncccagccat ggccccgcgc 60 

aaggaactgc tcaagtccat ccggcacgcc tctaccgcgc tggacgtgga gaagagtggc 120 

aaagtctcca agtcccagcc caaggtgccg tcccacaacc tgcacacggt cccgcacatc 180 

ccccatgacc ccgnggccct ggaggaacac ttccgagatg acgatgacgg ccccgcgtcc 240 

agccagggac acacgcccta ccccaacaag cacaccccgg acaaggtgga ggagggggct 3 00 

r;ct:gc.taaag agcaccttga tgagctgcgc cggacgccga cggccaagaa gaacraccgg 360 

gcagatagca acgggaacag tatgctctcc aatcaggatg ccttccgcct ccggtgcctc 420 

cccaacttcc tgtcr-gagga caagtacccc ccgancatgg cccccgatga ggtggaacac 480 

ctgcngaaaa aggtacccag cagcatgagc ttggaggtga gcttgggtga gccggaggag 540 

cntccggccc aggaggccca ggcggcccag accaccgggg ggcccagcgt ccggcagttc 600 

ctggagctcc tcaattcggg ccgncgcccg cggggcgtgg gccgggacaa ccticagcatg 660 

gccatccacg aggrccacca ggagctcacc caagacgtcc cgaagcaggg ccacccgngg 72 0 

aagcgagggc acctgagaag gaactgggcc gaacgccggc rccagctgca gcccagcngc 780 

ctctiggcnac ccrgggagcg aagagtgcaa agagaaaagg ggcactatcc cgctggacgc 840 

acaccgctgc gtggaggcgc cgccagaccg cgacggaaag cgct.gcatgt cctgcgtgaa 900 

gacagccacc cgcacgcatg agatgagcgc ctcagacacg cgccaggcca ggagcggaca 960 

gccgccaccc agatggcgat ccggctgcag gccgagggga agacgtccct acacaaggac 1020 

ctgaagcaga aacggcgcga gcagcgggag cagcgggagc gncgccgggc ggccaaggaa 10 80 

gaggagctgc tgcggctgca gcactgcagg aggagaagga gcggaagtgc aggagcngga 1140 

gctgctgcag gaggcgcacg gcaggccgag cggcrgctgc aggaggagga ggaacggcgc 12 00 

cgcagccagc accgcgagct gcagcaggcg ctcgagggcc aactgcgcga ggcggagcag 1260 

gcccgggcct ccatgcaggc cgagatggag ctgaaggagg aggaggctgc ccggcagcgg 1320 

cagcgcatcc aaggagccgg aggatatgca gcagcggtcg caggaggccc tgcaactaga 13 80 

ggtgaaagct cggcgagacg aagaatctgt gcgaaccgct cagaccagac tgccggaaga 1440 

ggaggaagag aagctgaagc agtitgacgca gccgaaggag gagcaggagc gccacatcga 1500 
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acgggcgcac aggagaagga agagctgcag caggagargg cacagcagag ccgctcccrg 1560 

cagcaggccc agcagcagct ggaggaggcg cggcagaacc ggcagagggc cgacgaggac 1520 

gcggaggctg cccagagaaa actgcgccag gccagcacca acgcgaaaca crggaacgcc 1680 

cagargaacc ggctgatgca tccaaccgag ccnggagara agcgrccggc caccagcagc 1740 

tcctrctcag gcrcccagcc ccctccgcct: gcccaccgtg actccrcccc aaagcgcccg 1800 

acccgccggg gatcccaggg caacaggacc ccccgcgccc aacagcaacg agcagcagaa i860 

gtcccccaat ggcggggatg aggctccrgc cccggczzcc acccctcagg aagataaacc 1920 

ggatccagca ccagaaaatt agcctcrccc agccccccgt tcctcccaat grcatatcca 1980 

ccaggacccg gccacagctg gcccgtgggt gatcccagct ctractagga gagggagcng 2040 

aggccctggt gccaggggcc caggccctcc aaccanaaac agrccaggat ggaacctggr 2100 

tcacccrcca caccagctcc aagccccaga cca::gggagc cgcctgggac gtcgatccct 2160 

gagaacttgg cccrgtgctt: tagacccaag gacccgactc ctgggctagg aaagagagaa 2220 

caagcaagcc ggggccaccr. gcccccaggt ggccaccaag ttgtggaagc acacttcraa 2280 

ataaaaacng cccttagaat gaactattgg cccaggcctg tccatctccc ctgccatrtc 2340 

ctcccrccct ccctcaagcc ccgfcatagg tcccaaagag cagtaaaagt ataacaaagt 2400 

ggttaagaaa gaccccgcag ctagactgcc tgggrtctg 2 439 



<210> 13 

<211> 1307 

<212> DNA 

<213> Homo sapiens 



<220> 

<221> misc_f eature 

<223> Incyte ID No : 331642. l.j 



<220> 

<221> unsure 
<222> 891 

<223> a, c, c, g, or other 



<400> 13 

ccggcccgct agccgccctg cgggacgccg gcgctgacgg gttggggaaa cggacgcctg 60 

gagaacggaa anccagttat caaaartgac ccaagaagag agaacccaac agaacaacaa 120 

caatggaaga aaccgggaac actaccacaa agctarcacc ccgccaaacc ccaggctcag 180 

acgccacagg ttaaaaaaaa gCccttcacg aaaaagaaag atcrtaagca gcatgatgga 240 

tccagaagct catgaaaaga ggccaccaat actaacatct: tcaaaacacg acatatcacc 3 00 

tcacactaca aatgctggtg agatgaagca ct.act:cgtgt ggctgctgcg ccgtaccgaa 3 60 

caacaccgca arcacatatc ccactcagaa ggtccrcttt cgacaacagc tgtatggcar 420 

caaaacccgg gatgcaacac ctcagtrgag aagggacgga cttcgaaatt tgtaccgtgg 480 

aatcctrccc ccactgatgc agaagacaac racgcztgca crracgcttg gcctgcatga 540 

ggatttatcc tgccccctcc acaagcatgt cagcgcccca gagtccgcaa ccagtggcgt 600 

ggcggcagrg cttgcaggga caacagaagc aatcrtcact ccactggaaa gagntcagac 660 

attgcntcaa gaccacaagc atcatgacaa atttaccaac acttaccagg cctccaaggc 720 

actgaaatgc catggaattg gagagratta tcgaggttgg tgcccattct tttccggaat 780 

ggacccagca angccttgtt cttcggcttc gaggtcccat caaggagcat ctgccraccg 840 

caacgactca cagtgctcat: ctggtcaarg actttanccg tggaggtcta ntgggtgcca 900 

tgttgggatt cctgcctttt ccaattaacg ttgraaaaac tcgcatacag tctcagattg 960 

gt.ggggaatt tcagcccttc cccaaggttc cccaaaaaat crggccggaa cgggacagaa 1020 

aactgataaa ccrctccaga ggcgcccacc cgaacracca tcggtccccc atctcttggg 1080 

gcacaatcaa cgcaacntar gagctcttgt taaaggttat atgaaaaaac catcagtraa 1140 

gtgccatcta ccaactgaac agacccrcna agaagaatgc agrttggccc ctttcttagc 1200 

tggccaaara caagtrggtg ncataacrcc aggccacagt gagttacggg caaagctgct 1260 

tcgcttaagc cccaacaaaa cagaataaaa gatnccaana ggaaaac 1307 



<210> 14 

<211> 303 

<212> DNA 

<213> Homo sapiens 

<220> 
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<221> misc_f eafure 

<223> Incyce ID No: 445594. 2. j 

<220> 

<221> unsure 
<222> 184 

<223> a, t, c, g, or other 
<400> 14 

gcgctctcgg cccacacaat atgaccrcgg 
cccacccctc ctcaatgaat gactgactta 
gaaagaagag gagggaangg ctctcrctca 
agancccrct caagaggagt gggagcgcct 
cgtgatgttg gagaactaca ggaacccgcr 
aga 



ggaggacgcg aggaagatga actgcgacga 60 
cccgagaaag aaactcagag gaagaggaaa 12 0 
gggaccgttt acattcaagg acgcggccat 180 
ggacccrgcc cagagggcct tgtacaggga 240 
■ctctctcgat gaggataaca cccctccaga 3 00 
303 



13/13 



I0/0Q941I 



jcisf:i:V:?,r:-:; 30 m 2001 

PCT/US00/I5344 



MOLECULES FOR DISEASE DETECTION AND TREATMENT 

TECHNICAL FIELD 

The present invention relates to molecules for disease detection and treatment and to the use 
5 of these sequences in the diagnosis, study, prevention, and treatment of diseases associated with 
disease detection and treatment molecules. 

BACKGROUND OF THE INVENTION 

The human genome is comprised of thousands of genes, many encoding gene products that 

10 function in the maintenance and growth of the various cells and tissues in the body. Aberrant 
expression or mutations in these genes and their products is the cause of, or is associated with, a 
variety of human diseases such as cancer and other cell proliferative disorders. The identification of 
these genes and their products is the basis of an ever-expanding effon to find markers for early 
detection of diseases, and targets for their prevention and treatment. 

15 For example, cancer represents a type of cell proliferative disorder that affects nearly every 

tissue in the body. A wide variety of molecules, either aberrantly expressed or mutated, can be the 
cause of, or involved with, vanous cancers because tissue growth involves complex and ordered 
patterns of cell proliferation, eel] differentiation, and apoptosis. Cell proliferation must be regulated 
to maintain both the number of cells and their spatial organization. This regulation depends upon the 

20 appropriate expression of proteins which control ceil cycle progression in response to extracellular 
signals such as growth factors and other mitogens, and intracellular cues such as DNA damage or 
nutrient starvation. Molecules which directly or indirectly modulate cell cycle progression fail into 
several categories, including growth factors and their receptors, second messenger and signal 
transduction proteins, oncogene products, tumor-suppressor proteins, and mitosis-promotmg factors. 

25 Aberrant expression or mutations m any of these gene products can result in cell proliferative 

disorders such as cancer. Oncogenes are genes generally derived from normal genes that, through 
abnormal expression or mutation, can effect the transformation of a normal cell to a malignant one 
(oncogenesis). Oncoproteins, encoded by oncogenes, can affect cell proliferation in a variety of ways 
and include growth factors, growth factor receptors, intracellular signal transducers, nuclear 

3 0 transcnption factors, and cell-cycle control proteins, in contrast, tumor-suppressor genes are 
involved in inhibiting cell proliferation. Mutations which cause reduced or loss of function m 
tumor-suppressor genes result in aberrant cell proliferation and cancer. Thus a wide variety of genes 
and their products have been found that are associated with cell proliferative disorders such as cancer, 
but many more may exist that are yet to be discovered. 

35 DNA-based arrays can provide a simple way to explore the expression of a single 

polymorphic gene or a large number of genes. When the expression of a single gene is explored. 
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DNA-based arrays are employed to detect the expression of specific gene variants. For example, a 
p53 tumor suppressor gene array is used to deiermme whether individuals are carrying mutations that 
predispose them to cancer. A cytochrome p450 gene array is useful to determine whether individuals 
have one of a number of specific mutations that could result in increased drug metabolism, drug 
5 resistance or drug toxicity. 

DNA-based array technology is especially relevant for the rapid screening of expression of a 
large number of genes. There is a growing awareness that gene expression is affected in a global 
fashion. A. genetic predisposition, disease or therapeutic treatment may affect, directly or indirectly, 
the expression of a large number of genes. In some cases the interactions may be expected, such as 

10 when the genes are part of the same signaling pathway. In other cases, such as when the genes 
participate m separate signaling pathways, the interactions may be totally unexpected. Therefore, 
DNA-based arrays can be used to investigate how genetic predisposition, disease, or therapeutic 
treatment affects the expression of a large number of genes. 

The discovery of new molecules for disease detection and treatment satisfies a need m the an 

15 by providing new compositions which are useful in the diagnosis, study, prevention, and treatment of 
diseases. 

SUMMARY OF THE INVENTION 

The present invention relates to human polynucleotides encoding molecules for disease 
20 detection and treatment (mddt) as presented in the Sequence Listing. Some of the mddi uniquely 
identify genes encoding structural, functional, and regulatory molecules for disease detection and 
treatment. 

The invention provides an isolated polynucleotide comprising a polynucleotide sequence 
selected from the group consisting of a) a polynucleotide sequence selected from the group consisting 

25 of SEQ ID NO; 1-1 4; b) a naturally occurring polynucleotide sequence having at least 90% sequence 
identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-14; c) a 
poiynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); 
and e) an RNA equivalent of a) through d). In one alternative, the polynucleotide comprises a 
polynucleotide sequence selected from the group consisting of SEQ ID NO; 1-14. In another 

30 alternative, the poiynucleotide comprises at least 60 contiguous nucleotides of a poiynucleotide 

sequence selected from the group consisting of a) a polynucleotide sequence selected from the group 
consisting of SEQ ID NO: 1-14; b) a naturally occurring polynucleotide sequence having at least 90% 
sequence identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1- 
14; c) a polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary 

35 to b): and e) an RNA equivalent of a) through d). The invention further provides a composition for 
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the detection of expression of disease detection and treatment molecule polynucleotides comprising at 
least one isolated polynucleotide comprising a polynucleotide sequence selected from the group 
consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1- 14: b) 
a naturally occurring polynucleotide sequence having at least 90% sequence identity to a 
5 polynucleotide sequence selected from the group consisting of SEQ ID NO:l-14; c) a polynucleotide 
sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA 
equivalent of a) through d): and a detectable label. 

The invention also provides a method for detecting a target polynucleotide in a sample, said 
target polynucleotide comprising a polynucleotide sequence selected from the group consisting of a) a 

10 polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-14: b) a naturally 
occurring polynucleotide sequence having at least 90% sequence identity to a polynucleotide 
sequence selected from the group consisting of SEQ ID NO: 1-14; c) a polynucleotide sequence 
complementary to a); d) a polynucleotide sequence complementary to b); and c) an RN.A. equivalent 
of a) through d). The method comprises a) hybridizing the sample with a probe compnsmg at least 20 

15 contiguous nucleotides comprising a sequence complementary to said target polynucleotide m the 
sample, and which probe specifically hybridizes to said target polynucleotide, under conditions 
whereby a hybridization complex is formed between said probe and said target polynucleotide, and b) 
detecting the presence or absence of said hybridization complex, and, optionally, if present, the 
amount thereof. In one alternative, the probe comprises at least 30 contiguous nucleotides. In 

20 another alternative, the probe comprises at least 60 contiguous nucleotides. 

The invention further provides a recombinant polynucleotide comprising a promoter sequence 
operably linked to an isolated polynucleotide comprising a polynucleotide sequence selected from the 
group consisting of a) a polynucleotide sequence selected from the group consisting of SEQ ID N0:1- 
14; b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a 

25 polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-14; c) a polynucleotide 
sequence complementary to a); d) a polynucleotide sequence complementary to b); and e) an RNA 
equivalent of a) through d). In one alternative, the invention provides a cell transformed with the 
recombinant polynucleotide. In another alternative, the invention provides a transgenic organism 
comprising the recombinant polynucleotide. In a funher altemaiive, the invention provides a method 

30 for producing a disease detection and treatment molecule polypeptide, the method comprising a) 
cultunng a cell under conditions suitable for expression of the disease detection and treatment 
molecule polypeptide, wherein said cell is transformed with the recombinant polynucleotide, and b) 
recovering the disease detection and treatment molecule polypeptide so expressed. 

The invention also provides a purified disease detection and treatment molecule polypeptide 

35 (MDDT) encoded by at least one polynucleotide comprising a polynucleotide sequence selected from 
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the group consisting of SEQ ID NO: 1-14. Additionally, the invention provides an isolated antibody 
which specifically binds to the disease detection and treatment molecule polypeptide. The invention 
funher provides a method of identifying a test compound which specifically binds to the disease 
detection and treatment molecule polypeptide, the method comprising the steps of a) providing a test 
5 compound: b) combining the disease detection and treatment molecule polypeptide with the test 

compound for a sufficient time and under suitable conditions for binding; and c) detecting binding of 
the disease detection and treatment molecule polypeptide to the test compound, thereby identifying 
the test compound which specifically binds the disease detection and treatment molecule polypeptide. 
The invention further provides a microarray wherein at least one element of the microarray is 
10 an isolated polynucleotide comprising at least 60 contiguous nucleotides of a polynucleotide 

comprising a polynucleotide sequence selected from the group consisting of a) a polynucleotide 
sequence selected from the group consisting of SEQ ID NO: 1-14; b) a naturally occurring 
polynucleotide sequence having at least 90% sequence identity to a polynucleotide sequence selected 
from the group consisting of SEQ ID NO:l-14: c) a polynucleotide sequence complementar>' to a): d) 
15 a polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d). The 
invention also provides a method of using the microarray for generating a transcript image of a 
sample which contains polynucleotides. The method comprises a) labeling the polynucleotides of the 
sample, b) contacting the elements of the microarray with the labeled polynucleotides of the sample 
under conditions suitable for the formation of a hybridization complex, and c) quantifying the 
20 expression of the polynucleotides m the sample. 

Additionally, the invention provides a method for screening a compound for effectiveness in 
altering expression of a target polynucleotide, wherein said target polynucleotide comprises a 
polynucleotide sequence selected from the group consisting of a) a polynucleotide sequence selected 
from the group consisting of SEQ ID NO: 1-14; b) a naturally occurring polynucleotide sequence 
25 having at least 90% sequence identity to a polynucleotide sequence selected from the group 
consisting of SEQ ID N0:l-14; c) a polynucleotide sequence complementary to a); d) a 
polynucleotide sequence complementary to b); and e) an RNA equivalent of a) through d). The 
method comprises a) exposing a sample comprising the target polynucleotide to a compound, and b) 
detecting altered expression of the target polynucleotide. 
30 The invention further provides a method for detecting a target polynucleotide in a sample for 

toxicity testing of a compound, said target polynucleotide comprising a polynucleotide sequence 
selected from the group consisting of a) a polynucleotide sequence selected from the group consisting 
of SEQ ID NO: 1-14; b) a naturally occurring polynucleotide sequence having at least 907c sequence 
identity to a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-14; c) a 
35 polynucleotide sequence complementary to a); d) a polynucleotide sequence complementary to b); 
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and e) an RNA equivalent of a) through d). The method comprises a) hybridizing the sample with a 
probe comprising at least 20 contiguous nucleotides comprising a sequence compiementarv' to said 
target polynucleotide in the sample, and which probe specifically hybridizes to said target 
polynucleotide, under conditions whereby a hybridization complex is formed between said probe and 
5 said target polynucleotide, b) detecting the presence or absence of said hybridization complex, and. 
optionally, if present, the amount thereof, and c) companng the presence, absence or amount of said 
target polynucleotide in a first biological sample and a second biological sample, wherein said first 
biological sample has been contacted with said compound, and said second sample is a control, 
whereby a change in presence, absence or amount of said target polynucleotide in said first sample, as 
10 compared with said second sample, is indicative of toxic response to said compound. 

DESCRIPTION OF THE TABLES 

Table 1 shows the sequence identification numbers (SEQ ID NO:s) and template 
identification numbers (template IDs) corresponding to the polynucleotides of the present invention, 
15 along with their GenBank hits (GI Numbers), probability scores, and functional annotations 
corresponding to the GenBank hits. 

Table 2 shows the sequence identification numbers (SEQ ID NO.s) and template 
identification numbers (template IDs) corresponding to the polynucleotides of the present invention, 
along with polynucleotide segments of each template sequence as defined by the indicated "start" and 

2 0 "stop" nucleotide positions. The reading frames of the polynucleotide segments and the Pfam hits, 

Pfam descriptions, and E-values corresponding to the polypeptide domains encoded by the 
polynucleotide segments are indicated. 

Table 3 shows the sequence identification numbers (SEQ ID NO:s) and template 
identification numbers (template IDs) corresponding to the polynucleotides of the present invention, 
25 along with polynucleotide segments of each template sequence as defined by the indicated "start" and 
"stop" nucleotide positions. The reading frames of the polynucleotide segments are shown, and the 
polypeptides encoded by the polynucleotide segments constitute either signal peptide (SP) or 
transmembrane (TM) domains, as indicated. 

Table 4 shows the sequence identification numbers (SEQ ID NO.s) and template 

3 0 identification numbers (template IDs) corresponding to the polynucleotides of the present invention, 

along with component sequence identification numbers (component IDs) corresponding to each 
template. The component sequences, which were used to assemble the template sequences, are 
defined by the indicated "start" and "stop" nucleotide positions along each template. 

Table 5 summarizes the bioinformaiics tools which are useful for analysis of the 
3 5 polynucleotides of the present invention. The first column of Table 5 lists analytical tools, programs. 
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and algorithms, the second column provides bnef descnptions thereof, the third column presents 
appropriate references, all of which are mcorporated by reference herein m their entirety, and the 
founh column presents, where applicable, the scores, probability values, and other parameters used to 
evaluate the strength of a match between two sequences (the higher the score, the greater the 
5 homology between two sequences). 

DETAILED DESCRIPTION OF THE INVENTION 

Before the nucleic acid sequences and methods are presented, it is to be understood that this 
invention is not limited to the particular machines, methods, and materials described. Although 
10 panicuiar embodiments are described, machines, methods, and materials similar or equivalent to these 
embodiments may be used to practice the invention. The preferred machines, methods, and materials 
set forth are not intended to limit the scope of the invention which is limned only by the appended 
claims. 

The singular forms "a", "an", and "the" include plural reference unless the context clearly 
15 dictates otherwise. All technical and scientific terms have the meanings commonly understood by 
one of ordinary skill in the art. All publications are incorporated by reference for the purpose of 
describing and disclosing the cell lines, vectors, and methodologies which are presented and which 
iTught be used in connection with the invention. Nothing m the specification is to be construed as an 
admission that the invention is not entitled to antedate such disclosure by virtue of prior invention. 

20 

Definitions 

As used herein, the lower case "mddt" refers to a nucleic acid sequence, while the upper case 
"MDDT" refers to an amino acid sequence encoded by mddt. A "full-length" mddt refers to a nucleic 
acid sequence containing the entire coding region of a gene endogenously expressed m human tissue. 

25 "Adjuvants" are materials such as Freund's adjuvant, mineral gels ("aluminum hydroxide), and 

surface active substances (lysolecithin, pluronic polyois. polyanions, peptides, oil emulsions, keyhole 
limpet hemocyanin, and dinitrophenol) which may be administered to increase a host's 
immunological response. 

"Allele" refers to an alternative form of a nucleic acid sequence. Alleles result from a 

3 0 "mutation," a change or an alternative reading of the genetic code. Any given gene may have none, 
one, or many allelic forms. Mutations which give rise to alleles include deletions, additions, or 
substitutions of nucleotides. Each of these changes may occur alone, or in combination with the 
others, one or more times in a given nucleic acid sequence. The present invention encompasses 
allelic mddt. 

35 "Amino acid sequence" refers to a peptide, a polypeptide, or a protein of either natural or 
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synthetic ongin. The amino acid sequence is not limited to the complete, endogenous ammo acid 
sequence and may be a fragment, epitope, variant, or derivative of a protein expressed by a nucleic 
acid sequence. 

"Amplification" refers to the production of additional copies of a sequence and is earned out 
5 using polymerase chain reaction (PGR) technologies well known in the an. 

"Antibody" refers to intact molecules as well as to fragments thereof, such as Fab, F(ab')-„ 
and Fv fragments, which are capable of binding the epitopic determinant. Antibodies that bind 
MDDT polypeptides can be prepared using intact polypeptides or using fragments containing small 
peptides of interest as the immunizing antigen. The polypeptide or peptide used to immunize an 
10 animal (e.g., a mouse, a rat, or a rabbit) can be derived from the translation of RNA, or synthesized 
chemically, and can be conjugated to a carrier protein if desired. Commonly used earners that are 
chemically coupled to peptides include bovine serum albumin, thyroglobuiin. and keyhole limpet 
hemocyanin (KLH). The coupled peptide is then used to immunize the animal. 

"Antisense sequence" refers to a sequence capable of specifically hybridizing to a target 
15 sequence. The antisense sequence may include DNA. RNA, or any nucleic acid mimic or analog such 
as peptide nucleic acid (PNA); oligonucleotides having modified backbone linkages such as 
phosphorothioaies, methyiphosphonates, or benzylphosphonates; oligonucleotides having modified 
sugar groups such as 2'-meihoxyeihyl sugars or 2'-methoxyethoxy sugars; or oligonucleotides having 
modified bases such as 5-methyl cytosine. 2'-deoxyuracil. or 7-deaza-2'-deoxyguanosine. 
20 "Antisense sequence" refers to a sequence capable of specifically hybridizing to a target 

sequence. The antisense sequence can be DNA, RNA. or any nucleic acid mimic or analog. 

"Antisense technology" refers to any technology which relies on the specific hybridization of 
an antisense sequence to a target sequence. 

A "bin" is a ponion of computer memory space used by a computer program for storage of 
25 data, and bounded in such a manner that data stored in a bin may be retrieved by the program. 

"Biologically active" refers to an amino acid sequence having a structural, regulatory, or 
biochemical function of a naturally occumng amino acid sequence. 

"Clone joining" is a process for combining gene bins based upon the bins' containing 
sequence information from the same clone. The sequences may assemble into a primary gene 
3 0 transcript as well as one or more splice variants. 

"Complemeniarv'" describes the relationship between two smgie-stranded nucleic acid 
sequences that anneal by base-pairing {5'-A-G-T-3' pairs with its complement 3'-T-C-A-5'). 

A "component sequence" is a nucleic acid sequence selected by a computer program such as 
PHRED and used to assemble a consensus or template sequence from one or more component 
35 sequences. 
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A "consensus sequence" or "template sequence" is a nucleic acid sequence which has been 
assembled from overlapping sequences, using a computer program for fragment assembly such as the 
GELVIEW fragment assembly system (Genetics Computer Group (GCG), Madison WI) or using a 
relational database management system (RDMS). 
5 "Conservative amino acid substitutions" are those substitutions that, when made, least 

interfere with the properties of the original protein, i.e., the structure and especially the function of 
the protein is conserved and not significantly changed by such substitutions. The table below shows 
amino acids which may be substituted for an original amino acid in a protein and which are regarded 
as conservative substitutions. 





Original Residue 


Conservative Substitution 




Ala 


Gly, Ser 




Arg 


His, Lys 




Asn 


Asp. Gin. His 




Asp 


Asn. Glu 




Cys 


Ala. Ser 




Gin 


Asn, Glu, His 




Glu 


Asp. Gin. His 




Gly 


Ala 


20 


His 


Asn. Arg, Gin, Glu 




He 


Leu. Val 




Leu 


He, Val 




Lys 


Arg, Gin. Glu 




Met 


Leu, He 




Phe 


His, Met, Leu. Trp, Tvr 




Ser 


Cys, Thr 




Thr 


Ser, Val 




Trp 


Phe. Tyr 




Tvr 


His, Phe. Trp 


30 


Va! 


He. Leu. Thr 



Conservative substitutions generally maintain (a) the structure of the polypeptide backbone in 
the area of the substitution, for example, as a beta sheet or alpha helical conformation, (b) the charge 
35 or hydrophobicity of the molecule at the target site, or (c) the bulk of the side chain. 

"Deletion" refers to a change in either a nucleic or amino acid sequence in which at least one 
nucleotide or amino acid residue, respectively, is absent. 

"Derivative" refers to the chemical modification of a nucleic acid sequence, such as by 
replacement of hydrogen by an aikyl. acyl, amino, hydroxyl, or other group. 
40 The terms "element" and "array element" refer to a polynucleotide, polypeptide, or other 

chemical compound having a unique and defined position on a microarray. 
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"E-value" refers to the statistical probabiiu>' that a match between two sequences occurred b> 

chance. 

A "fragment" is a unique portion of mddt or MDDT which is identical in sequence to but 
shoner in length than the parent sequence. A fragment may comprise up to the entire length of the 
5 defined sequence, minus one nucleotide/amino acid residue. For example, a fragment may comprise 
from 10 to 1000 contiguous amino acid residues or nucleotides. A fragment used as a probe, pnmer, 
antigen, therapeutic molecule, or for other purposes, may be at least 5. 10. 15. 16. 20. 25. 30. 40. 50. 
60, 75. 100, 150, 250 or at least 500 contiguous amino acid residues or nucleotides in length. 
Fragments may be preferentially selected from certain regions of a molecule. For example, a 

10 polypeptide fragment may comprise a cenain length of contiguous amino acids selected from the first 
250 or 500 amino acids (or first 25% or 50%) of a polypeptide as shown in a cenain defined 
sequence. Clearly these lengths are exemplary, and any length that is supported by the specification, 
including the Sequence Listing and the figures, may be encompassed by the present embodiments. 

A fragment of mddt comprises a region of unique polynucleotide sequence thai specifically 

15 identifies mddt. for example, as distinct from any other sequence in the same genome. A fragment of 
mddt is useful, for example, m hybridization and amplification technologies and in analogous 
methods that distinguish mddt from related polynucleotide sequences. The precise length of a 
fragment of mddt and the region of mddt to which the fragment corresponds are routinely 
determinable by one of ordinary skill in the art based on the intended purpose for the fragment. 

2 0 A fragment of MDDT is encoded by a fragment of mddt. A fragment of MDDT compnses a 

region of unique amino acid sequence that specifically identifies MDDT. For example, a fragment of 
MDDT IS useful as an immunogenic peptide for the development of antibodies that specifically 
recognize MDDT. The precise length of a fragment of MDDT and the region of MDDT to which the 
fragment corresponds are routinely determinable by one of ordinary ski!! in the art based on the 

25 intended purpose for the fragment. 

A "full length" nucleotide sequence is one contairmg at least a start site for translation to a 
protein sequence, followed by an open reading frame and a stop site, and encoding a "full length" 
polypeptide. 

"Hit" refers to a sequence whose annotation will be used to describe a given template. 
30 Criteria for selecting the top hit are as follows; if the template has one or more exact nucleic acid 
matches, the top hit is the exact match with highest percent identity. If the template has no exact 
matches but has significant protein hits, the top hit is the protein hit with the lowest E-value. If the 
template has no significant protein hits, but does have significant non-exact nucleotide hits, the top hit 
is the nucleotide hit with the lowest E-value. 
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"Homology" refers to sequence similarity either between a reference nucleic acid sequence 
and at least a fragment of an mddt or between a reference ammo acid sequence and a fragment of an 
MDDT. 

"Hybridization" refers to the process by which a strand of nucleotides anneals with a 
complementary strand through base pairing. Specific hybridization ts an indication that two nucleic 
acid sequences share a high degree of identity. Specific hybridization complexes form under defined 
annealing conditions, and remain hybridized after the "washing" step. The defined hybridization 
conditions include the annealing conditions and the washing step(s), the latter of which is panicularly 
important in determining the stringency of the hybridization process, with more stringent conditions 
allowing less non-specific binding, i.e., binding between pairs of nucleic acid probes that are not 
perfectly matched. Permissive conditions for annealing of nucleic acid sequences are routinely 
determinable and may be consistent among hybridization experiments, whereas wash conditions may 
be varied among experiments to achieve the desired stringency. 

Generally, stringency of hybridization is expressed with reference to the temperature under 
which the wash step is carried out. Generally, such wash temperatures are selected to be about 5°C to 
20°C lower than the thermal melting point {TJ for the specific sequence at a defined ionic strength 
and pH. The T^, is the temperature (under defined ionic strength and pH) at which 509^ of the target 
sequence hybridizes to a perfectly matched probe. .A.n equation for calculating T^ and conditions for 
nucleic acid hybridization is well known and can be found in Sambrook. et al.. 1989. Molecular 
Cloning: A Laboratory Manual . 2"^ ed., vol. 1-3. Cold Spring Harbor Press, Plainview NY; 
specifically see volume 2. chapter 9. 

High stnngency conditions for hybridization between polynucleotides of the present 
invention include wash conditions of 68°C in the presence of about 0.2 x SSC and about 0. 1 % SDS. 
for 1 hour. Alternatively, temperatures of about 65°C, 60°C, or 55°C may be used. SSC 
concentration may be varied from about 0.2 to 2 x SSC. with SDS being present at about 0A9c. 
Typically, blocking reagents are used to block non-specific hybridization. Such blocking reagents 
include, for instance, denatured salmon sperm DNA at about 100-200 |ig/ml. Useful variations on 
these conditions will be readily apparent to those skilled in the art. Hybridization, panicularly under 
high stringency conditions, may be suggestive of evolutionary similarity between the nucleotides. 
Such similarity is strongly indicative of a similar role for the nucleotides and their resultant proteins. 

Other parameters, such as temperature, salt concentration, and detergent concentration may 
be varied to achieve the desired stringency. Denaturants, such as formamide at a concentration of 
about 35-50% v/v, may also be used under particular circumstances, such as RNA;DNA 
hybridizations. Appropriate hybridization conditions are routinely determinable by one of ordinary 
skill in the art. 
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"Immunogenic" descnbes the potential for a natural, recombinant, or synthetic peptide, 
epitope, polypeptide, or protem to induce antibody production in appropriate animals, cells, or cell 
lines. 

"Insertion" or "addition" refers to a change in either a nucleic or ammo acid sequence in 
5 which at least one nucleotide or residue, respectively, is added to the sequence. 

"Labeling" refers to the covalent or noncovalent joining of a polynucleotide, polypeptide, or 
antibody with a reponer molecule capable of producing a detectable or measurable signal. 

"Microarray" is any arrangement of nucleic acids, ammo acids, antibodies, etc., on a 
substrate. The substrate may be a solid support such as beads, glass, paper, nitrocellulose, nylon, or 
10 an appropriate membrane. 

"Linkers" are short stretches of nucleotide sequence which may be added to a vector or an 
mddt to create restriction endonuciease sites to facilitate cloning. "Polylinkers" are engineered to 
incorporate multiple restriction enzyme sites and to provide for the use of enzymes which leave 5' or 
3' overhangs (e.g., BamHL EcoRL and Hindlll) and those which provide blunt ends (e.g., EcoRV, 
15 SnaBI. and StuI). 

"Naturally occurring" refers to an endogenous polynucleotide or polypeptide that may be 
isolated from viruses or prokaryotic or eukaryotic cells. 

"Nucleic acid sequence" refers to the specific order of nucleotides joined by phosphodiester 
bonds in a linear, polymeric arrangement. Depending on the number of nucleotides, the nucleic acid 
20 sequence can be considered an oligomer, oligonucleotide, or polynucleotide. The nucleic acid can be 
DNA, RNA, or any nucleic acid analog, such as PNA, may be of genomic or synthetic origin, may be 
either double-stranded or single-stranded, and can represent either the sense or aniisense 
(complementary) strand. 

"Oligomer" refers to a nucleic acid sequence of at least about 6 nucleotides and as many as 
25 about 60 nucleotides, preferably about 15 to 40 nucleotides, and most preferably between about 20 
and 30 nucleotides, that may be used in hybridization or amplification technologies. Oligomers may 
be used as, e.g., primers for PCR. and are usually chemically synthesized. 

"Operably linked" refers to the situation in which a first nucleic acid sequence is placed in a 
functional relationship with the second nucleic acid sequence. For instance, a promoter is operably 
3 0 linked to a coding sequence if the promoter affects the transcription or expression of the coding 

sequence. Generally, operably linked DNA sequences may be in close proximity or contiguous and, 
where necessar>' to join two protein coding regions, in the same reading frame. 

"Peptide nucleic acid" (PNA) refers to a DNA mimic in which nucleotide bases are attached 
to a pseudopeptide backbone to increase stability. PNAs, also designated antigene agents, can 
35 prevent gene expression by targeting complementary messenger RNA. 



11 



wo 00/75298 



PCT/USOO/15344 



The phrases "percent identitv" and "9^ identity"", as applied to polynucleotide sequences, 
refer to the percentage of residue matches between at least two polynucleotide sequences aligned 
using a standardized algorithm. Such an algorithm may insert, m a standardized and reproducible 
way, gaps in the sequences being compared in order to optimize alignment between two sequences. 



Percent identity between polynucleotide sequences may be determined using the default 
parameters of the CLUSTAL V algorithm as mcorporated into the MEGALIGN version 3.12e 
sequence alignment program. This program is part of the LASERGENE software package, a suite of 
molecular biological analysis programs (DNASTAR, Madison "VVI). CLUSTAL V is described in 

10 Higgins, D.G. and Sharp. P.M. C1989) CABIOS 5:151-153 and in Higgms. D.G. et al. (1992) 

CABIOS 8:189-191. Forpairwise alignments of polynucleotide sequences, the default parameters are 
set as follows: Ktuple=2, gap penalty=5, window=4, and "diagonals saved'"=4. The "weighted" 
residue weight table is selected as the default. Percent identity is reported by CLUSTAL V as the 
"percent similarity" between aligned polynucleotide sequence pairs. 

15 Alternatively, a suite of commonly used and freely available sequence comparison algonthms 

is provided by the National Center for Biotechnology Information (NCBI) Basic Local Alignment 
Search Tool (BLAST) (Altschul, S.F. et al. (1990) J. Mol. Biol. 215:403^10), which is available 
from several sources, including the NCBI, Bethesda, MD. and on the Internet at 
http://www.ncbi.nim.nih.gov/BLAST/. The BLAST software suite includes various sequence 

20 analysis programs including "blastn," that is used to determine alignment between a known 

polynucleotide sequence and other sequences on a variety of databases. Also available is a tool called 
"BLAST 2 Sequences" that is used for direct pairwise comparison of two nucleotide sequences. 
"BLAST 2 Sequences" can be accessed and used interactively at 

http://www.ncbi.nlm.nih.gov/gorf/bi2/. The "BLAST 2 Sequences" tool can be used for both blastn 
25 and blastp (discussed below). BLAST programs are commonly used with gap and other parameters 
set to default settings. For example, to compare two nucIeoMde sequences, one may use blastn with 
the "BLAST 2 Sequences" tool Version 2.0.9 (May-07-1999) set at default parameters. Such default 
parameters may be, for example: 



5 



and therefore achieve a more meaningful comparison of the two sequences. 



Matrix: BLOSUM62 



35 



30 



Reward for match: 1 

Penalty for mismatch: -2 

Open Gap: 5 and Extension Gap: 2 penalties 

Gap X drop-off: 50 

Expect: JO 

Word Size: J] 
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Filter: on 

Percent identity may be measured over the length of an enure defined sequence, for example, 
as defined by a particular SEQ ID number, or may be measured over a shorter length, for example, 
over the length of a fragment taken from a larger, defined sequence, for instance, a fragment of at 
5 least 20. at least 30. at least 40, at least 50, at least 70. at least 100. or at least 200 contiguous 
nucleotides- Such lengths are exemplary only, and it is understood that any fragment length 
supported by the sequences shown herein, m figures or Sequence Listings, may be used to describe a 
length over which percentage identity may be measured. 

Nucleic acid sequences that do not show a high degree of identity may nevertheless encode 
10 similar amino acid sequences due to the degeneracy of the genetic code. It is understood that changes 
in nucleic acid sequence can be made using this degeneracy to produce multiple nucleic acid 
sequences that all encode substantially the same protein. 

The phrases "percent identity" and "7c identity", as applied to polypeptide sequences, refer to 
the percentage of residue matches between at least two polypeptide sequences aligned using a 
15 standardized algonthm. Methods of polypeptide sequence alignment are well-known. Some 
ahgnment methods take into account conservative amino acid substitutions. Such conservative 
substitutions, explained in more detail above, generally preserve the hydrophobicity and acidity of the 
substituted residue, thus preserving the structure (and therefore function) of the folded polypeptide. 
Percent identity between polypeptide sequences may be determined using the default 
20 parameters of the CLUSTAL V algorithm as incorporated into the MEGALIGN version 3.12e 
sequence alignment program (described and referenced above). For pairwise alignments of 
polypeptide sequences using CLUSTAL V. the default parameters are set as follows: Ktuple=l. gap 
penaliy=3, window=5, and "diagonals saved"=5. The PAM250 matrix is selected as the default 
residue weight table. As with polynucleotide alignments, the percent identity is reported by 
25 CLUSTAL V as the "percent similarity" between aligned polypeptide sequence pairs. 

Alternatively the NCBI BLAST software suite may be used. For example, for a pairwise 
comparison of two polypeptide sequences, one may use the "BLAST 2 Sequences" tool Version 2.0.9 
(May-07-1999) with blasip set at default parameters. Such default parameters may be. for example: 
Matrix: BLOSUM62 
3 0 Open Gap: I J and Extension Gap: 1 penalty 

Gap X drop-off: 50 
Expect: 10 
Word Size: 3 
Filter: on 
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Percent identity may be measured over the length of an entire defined polypeptide sequence, 
for example, as defined by a particular SEQ ID number, or may be measured over a shorter length, for 
example, over the length of a fragment taken from a larger, defined polypeptide sequence, for 
instance, a fragment of at least 15, at least 20, at least 30. at least 40, at least 50, at least 70 or at least 
150 contiguous residues. Such lengths are exemplary only, and it is understood that any fragment 
length supported by the sequences shown herein, in figures or Sequence Listings, may be used to 
describe a length over which percentage identity may be measured. 

"Post-translational modification" of an MDDT may involve lipidation, glycosylation. 
phosphorylation, acetylation. racemization, proteolytic cleavage, and other modifications known in 
the an. These processes may occur synthetically or biochemically. Biochemical modifications will 
vary by cell type depending on the enzymatic milieu and the MDDT. 

"Probe" refers to mddt or fragments thereof, which are used to detect identical, allelic or 
related nucleic acid sequences. Probes are isolated oligonucleotides or polynucleotides attached to a 
detectable label or reporter molecule. Typical labels include radioactive isotopes, hgands. 
chemiluminescent agents, and enzymes. "Primers" are short nucleic acids, usually DNA 
oligonucleotides, which may be annealed to a target polynucleotide by complementary base-pairing. 
The primer may then be extended along the target DNA strand by a DNA polymerase enzyme. 
Primer pairs can be used for amplification (and identification) of a nucleic acid sequence, e.g.. by the 
polymerase chain reaction (PGR). 

Probes and primers as u.sed in the present invention typically comprise at least 15 contiguous 
nucleotides of a known sequence. In order to enhance specificity, longer probes and primers may also 
be employed, such as probes and primers that comprise at least 20. 30, 40, 50, 60, 70, 80. 90, 100. or 
at least 150 consecutive nucleotides of the disclosed nucleic acid sequences. Probes and primers may 
be considerably longer than these examples, and it is understood that any length supponed by the 
specification, including the figures and Sequence Listing, may be used. 

Methods for preparing and using probes and pnmers are described in the references, for 
example Sambrook et al., 1989, Molecular Cloning: A Laboratory Manual . 2"^ ed., vol. 1-3, Cold 
Spring Harbor Press, Plainview NY; Ausubei et al.,1987. Current Proto cols in Molecular Biology. 
Greene Publ. Assoc. & Wiley-lntersciences, New York NY; Innis et al., 1990, PGR Protocols. A 
Guide to Methods and Applications . Academic Press, San Diego CA. PCR primer pairs can be 
denved from a known sequence, for example, by using computer programs intended for that purpose 
such as Primer (Version 0.5, 1991, Whitehead Institute for Biomedical Research, Cambridge MA). 

Oligonucleotides for use as primers are selected using software known in the an for such 
purpose. For example, OLIGO 4.06 software is useful for the selection of PCR primer pairs of up to 
100 nucleotides each, and for the analysis of oligonucleotides and larger polynucleotides of up to 
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5,000 nucleotides from an input polynucleotide sequence of up to 32 kilobases Similar pnmer 
selection programs have incorporated additional features for expanded capabilities. For example, the 
PnmOU pnmer selection program (available to the public from the Genome Center at University of 
Texas South West Medical Center. Dallas TX) is capable of choosing specific primers from 
megabase sequences and is thus useful for designing primers on a genome-wide scope. The Pnmer3 
primer selection program (available to the public from the Whitehead Institute/MIT Center for 
Genome Research, Cambridge MA) allows the user to input a "mispriming Hbrar>'." in which 
sequences to avoid as primer binding sites are user-specified. PrimerS is useful, in particular, for the 
selection of oligonucleotides for microarrays. (The source code for the latter two primer selection 
programs may also be obtained from their respective sources and modified to meet the user's specific 
needs.) The PrimeGen program (available to the public from the UK Human Genome Mapping 
Project Resource Centre, Cambridge UK) designs primers based on multiple sequence alignments, 
thereby allowing selection of primers that hybridize to either the most conserved or least conserved 
regions of aligned nucleic acid sequences. Hence, this program is useful for identification of both 
unique and conserved oligonucleotides and polynucleotide fragments. The oligonucleotides and 
polynucleotide fragments identified by any of the above selection methods are useful in hybndization 
technologies, for example, as PCR or sequencing primers, microarray elements, or specific probes to 
identify fully or partially complementary polynucleotides in a sample of nucleic acids. Methods of 
oligonucleotide selection are not limited to those described above. 

"Purified" refers to molecules, either polynucleotides or polypeptides that are isolated or 
separated from their natural environment and are at least 607c free, preferably at least 75% free, and 
most preferably at least 90% free from other compounds with which they are naturally associated. 

A "recombinant nucleic acid" is a sequence that is not naturally occumng or has a sequence 
that is made by an anificial combination of two or more otherwise separated segments of sequence. 
This artificial combination is often accomplished by chemical synthesis or. more commonly, by the 
artificial manipulation of isolated segments of nucleic acids, e.g., by genetic engineenng techniques 
such as those described in Sambrook, supra . The term recombinant includes nucleic acids that have 
been altered solely by addition, substitution, or deletion of a ponion of the nucleic acid. Frequently, a 
recombinant nucleic acid may include a nucleic acid sequence operably linked to a promoter 
sequence. Such a recombinant nucleic acid may be part of a vector that is used, for example, to 
transform a cell. 

Alternatively, such recombinant nucleic acids may be part of a viral vector, e.g., based on a 
vaccinia virus, that could be use to vaccinate a mammal wherein the recombinant nucleic acid is 
expressed, inducing a protective immunological response m the mammal. 
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"Reeulatory element" refers to a nucleic acid sequence from nontransiated regions of a gene, 
and includes enhancers, promoters, introns, and 3' untranslated regions, which interact with host 
proteins to carry out or regulate transcription or translation. 

"Reporter" molecules are chemical or biochemical moieties used for labeling a nucleic acid, 
an amino acid, or an antibody. They include radionuclides: enzymes: fluorescent, chemiluminescent, 
or chromogenic agents: substrates; cofactors; inhibitors: magnetic panicles; and other moieties known 
in the art. 

An "RNA equivalent," in reference to a DNA sequence, is composed of the same linear 
sequence of nucleotides as the reference DNA sequence with the exception that all occurrences of the 
nitrogenous base thymine are replaced with uracil, and the sugar backbone is composed of ribose 
instead of deoxyribose. 

"Sample" is used in its broadest sense. Samples may contain nucleic or amino acids, 
antibodies, or other materials, and may be derived from any source (e.g.. bodily fluids including, but 
not limited to, saliva, blood, and urine: chromosome(s), organelles, or membranes isolated from a 
cell: genomic DNA, RNA. or cDNA in solution or bound to a substrate; and cleared ceils or tissues or 
blots or imprints from such cells or tissues). 

"Specific binding" or "specifically binding" refers to the interaction between a protein or 
peptide and its agonist, antibody, antagonist, or other binding partner. The interaction is dependent 
upon the presence of a particular structure of the protein, e.g., the antigenic determinant or epitope, 
recognized by the binding molecule. For example, if an antibody is specific for epitope "A." the 
presence of a polypeptide containing epitope A. or the presence of free unlabeled A. in a reaction 
containing free labeled A and the antibody will reduce the amount of labeled A that binds to the 
antibody. 

"Substitution" refers to the replacement of at least one nucleotide or ammo acid by a different 
nucleotide or amino acid. 

"Substrate" refers to any suitable rigid or semi-rigid suppon including, e.g., membranes, 
filters, chips, slides, wafers, fibers, magnetic or nonmagnetic beads, gels, mbing, plates, polymers, 
micropanicles or capillaries. The substrate can have a vanety of surface forms, such as wells, 
trenches, pins, channels and pores, to which polynucleotides or polypeptides are bound. 

A "transcript image" refers to the collective pattern of gene expression by a particular tissue 
or cell type under given conditions at a given time. 

"Transformation" refers to a process by which exogenous DNA enters a recipient cell. 
Transformation may occur under natural or artificial conditions using various methods well known in 
the an. Transformation may rely on any known method for the insertion of foreign nucleic acid 
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sequences into a prokaryotic or eukap,'otic host cell. The method is selected based on the host ceil 
being transformed. 

"Transformants" include stably transformed cells in which the inserted DNA is capable of 
replication either as an autonomously replicating piasmid or as part of the host chromosome, as well 
5 as cells which transiently express insened DNA or RNA. 

A "transgenic organism." as used herein, is any organism, including but not limited to animals 
and plants, in which one or more of the ceils of the organism contains heterologous nucleic acid 
introduced by way of human intervention, such as by transgenic techniques well known in the art. 
The nucleic acid is introduced into the cell, directly or indirectly by introduction into a precursor of 

10 the cell, by way of deliberate genetic manipulation, such as by microinjection or by infection with a 
recombinant virus. The term genetic manipulation does not include classical cross-breeding, or in 
vitro fertilization, but rather is directed to the introduction of a recombinant DNA molecule. The 
transgenic organisms contemplated in accordance with the present invention include bacteria, 
cyanobacteria, fungi, and plants and animals. The isolated DNA of the present invention can be 

15 introduced into the host by methods known in the an. for example infection, transfection. 

transformation or transconjugation. Techniques for transferring the DNA of the present invention 
into such organisms are widely known and provided in references such as Sambrook et al. (1989), 
supra . 

A "vanani" of a particular nucleic acid sequence is defined as a nucleic acid sequence having 

2 0 at least 25% sequence identity to the particular nucleic acid sequence over a certain length of one of 

the nucleic acid sequences using blastn with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 
1999) set at default parameters. Such a pair of nucleic acids may show, for example, at least 30%. at 
least 509c. at least 60%, at least 70%. at least 80%. at least 90%, at least 95% or even at least 98% or 
greater sequence identity over a certain defined length. The variant may result in "conservative" 
25 amino acid changes which do not affect structural and/or chemical properties. A variant may be 
described as. for example, an "allelic" (as defined above), "splice," "species," or "polymorphic" 
variant. A splice variant may have significant identity to a reference molecule, but will generally 
have a greater or lesser number of polynucleotides due to alternate splicing of exons during mRNA 
processing. The corresponding polypeptide may possess additional functional domains or lack 

3 0 domains that are present in the reference molecule. Species variants are polynucleotide sequences 

that vary from one species to another. The resulting polypeptides generally will have significant 
ammo acid identity relative to each other, A polymorphic variant is a variation in the polynucleotide 
sequence of a particular gene between individuals of a given species. Polymorphic variants also may 
encompass "single nucleotide polymorphisms" (SNPs) in which the polynucleotide sequence varies 
3 5 by one base. The presence of SNPs may be indicative of, for example, a certain population, a disease 
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state, or a propensity for a disease state. 

In an alternative, variants of the polynucleotides of the present invennon may be generated 
through recombinant methods. One possible method is a DNA shuffling technique such as 
MOLECULARBREEDING (Maxygen Inc., Santa Clara CA: described in U.S. Patent Number 
5,837,458: Chang. C.-C. et al. (1999) Nat. Bioiechnol. 17:793-797; Christians. F.C. et al. (1999) Nat. 
Biotechnol. 17:259-264: and Crameri, A. et al. (1996) Nat. Bioiechnol. 14:315-319) to alter or 
improve the biological properties of MDDT. such as its biological or enzymatic activity or its ability 
to bind to other molecules or compounds. DNA shuffling is a process by v/hich a library of gene 
variants is produced using PCR-mediated recombination of gene fragments. The library is then 
subjected to selection or screening procedures that identify those gene variants with the desired 
properties. These preferred variants may then be pooled and further subjected to recursive rounds of 
DNA shuffling and selection/screening. Thus, genetic diversity is created through "anificial" 
breeding and rapid molecular evolution. For example, fragments of a single gene containing random 
point mutations may be recombined. screened, and then reshuffled until the desired properties are 
optimized. Aliemativeiy, fragments of a given gene may be recombined with fragments of 
homologous genes in the same gene family, either from the same or different species, thereby 
maximizing the genetic diversity of multiple naturally occurring genes in a directed and controllable 
manner. 

A "variant" of a particular polypeptide sequence is defined as a polypeptide sequence havmg 
at least 40% sequence identity to the particular polypeptide sequence over a cenain length of one of 
the polypeptide sequences using blasip with the "BLAST 2 Sequences" tool Version 2.0.9 (May-07- 
1999) set at default parameters. Such a pair of polypeptides may show, for example, at least 507c. at 
least 607c. at least 707c, at least 807c. at least 907c, at least 957c, or at least 987r or greater sequence 
identity over a cenam defined length of one of the polypeptides. 

THE INVENTION 

In a particular embodiment, cDNA sequences derived from human tissues and cell lines were 
aligned based on nucleotide sequence identity and assembled into "consensus" or "template" 
sequences which are designated by the template identification numbers (template IDs) in column 2 of 
Table 1 . The sequence identification numbers (SEQ ID NO:s) corresponding to the template IDs are 
shown in column 1 . The template sequences have similarity to GenBank sequences, or "hits." as 
designated by the GI Numbers in column 3. The statistical probability of each GenBank hit is 
indicated by a probability score in column 4, and the functional annotation corresponding to each 
GenBank hit is listed in column 5. 

The invention incorporates the nucleic acid sequences of these templates as disclosed in the 
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Sequence Listing and the use of these sequences in the diagnosis and treatment of disease states 
charactenzed by defects in molecules for disease detection and treatment. The invention funher 
utilizes these sequences in hybridization and amplification technologies, and in particular, in 
technologies which assess gene expression patterns correlated with specific ceils or tissues and their 
5 responses in vivo or in vitro to pharmaceutical agents, toxins, and other treatments. In this manner, 
the sequences of the present invention are used to develop a transcnpt image for a particular cell or 
tissue. 

Derivation of Nucleic Acid Sequences 

10 cDNA was isolated from libraries constructed using RNA derived from normal and diseased 

human tissues and cell lines. The human tissues and cell lines used for cDNA library construction 
were selected from a broad range of sources to provide a diverse population of cDNAs representative 
of gene transcnpiion throughout the human body. Descriptions of the human tissues and cell lines 
used for cDNA library construction are provided m the LIFESEQ database (Incyte Genomics, Inc. 

15 (Incyte), Palo Alto CA). Human tissues were broadly selected from, for example, cardiovascular, 
dermatologic, endocrine, gastrointestinal, hematopoietic/immune system, musculoskeletal, neural, 
reproductive, and urologic sources. 

Cell lines used for cDNA library construction were derived from, for example, leukemic 
cells, teratocarcinomas. neuroepitheliomas, cervical carcinoma, lung fibroblasts, and endothelial ceils. 

20 Such cell lines include, for example, THP- 1 . Jurkai. HUVEC, hNT2, W138, HeLa, and other cell 
lines commonly used and available from public depositories (Amencan Type Culture Collection, 
Manassas VA). Prior to mRNA isolation, cell lines were untreated, treated with a pharmaceutical 
agent such as 5 -aza-2-deoxycytidine. treated with an activating agent such as lipopolysaccharide in 
the case of leukocytic cell lines, or, in the case of endothelial cell lines, subjected to shear stress. 

25 

Sequencing of the cDNAs 

Methods for DNA sequencing are well known in the art. Conventional enzymatic methods 
employ the Klenow fragment of DNA polymerase I, SEQUENASE DNA polymerase (U.S. 
Biochemical Corporation, Cleveland OH). Taq polymerase (PE Biosysiems, Foster City CA), 

30 thermostable T7 polymerase (Amersham Pharmacia Biotech. Inc. (Amersham Pharmacia Biotech), 
Piscataway NJ), or combinations of polymerases and proofreading exonucleases such as those found 
in the ELONGASE amplification system (Life Technologies Inc. (Life Technologies), Gaithersburg 
MD), to extend the nucleic acid sequence from an oligonucleotide primer annealed to the DNA 
template of interest. Methods have been developed for the use of both single-stranded and double- 

35 stranded templates. Chain termination reaction products may be electrophoresed on urea- 
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polyacrylamide gels and detected either by autoradiography (for radioisotope-labeled nucleotides) or 
by fluorescence (for fluorophore-labeied nucleotides). Automated methods for mechanized reaction 
preparation, sequencing, and analysis using fluorescence detection methods have been developed. 
Machines used to prepare cDNAs for sequencing can include the MICROLAB 2200 liquid transfer 
5 system (Hamilton Company fHamiltcn). Reno NW), Peltier thermal cycler tPTC200; MJ Research. 
Inc. (MJ Research). Watertown MA), and ABI CATALYST 800 thermal cycler (PE Biosystems). 
Sequencing can be carried out using, for example, the ABI 373 or 377 (PE Biosystems) or 
MEGABACE 1000 (Molecular Dynamics. Inc. (Molecular Dynamics), Sunnyvale CA) DNA 
sequencing systems, or other automated and manual sequencing systems well known in the an. 

10 The nucleotide sequences of the Sequence Listing have been prepared by current, state-of- 

the-art, automated methods and, as such, may contain occasional sequencing errors or unidentified 
nucleotides. Such unidentified nucleotides are designated by an N. These infrequent unidentified 
bases do not represent a hindrance to practicing the invention for those skilled in the art. Several 
methods employmg standard recombinant techniques may be used to correct errors and complete the 

15 missing sequence information. (See. e.g., those described in Ausubel. F..M. et al. (1997) Short 

Protocols in Molecular Biology . John Wiley & Sons, New York NY; and Sambrook. J. et al. (1989) 
Molecular Cloning. A Laboratory Manual . Cold Spnng Harbor Press, Plainview NY.) 

Assembly of cDNA Sequences 

20 Human polynucleotide sequences may be assembled using programs or algorithms well 

known in the an. Sequences to be assembled are related, wholly or in part, and may be denved from 
a single or many different transcripts. Assembly of the sequences can be performed using such 
programs as PHRAP (Phils Revised Assembly Program) and the GEL VIEW fragment assembly 
system (GCG), or other methods known in the art. 

25 Alternatively, cDNA sequences are used as '"component'" sequences that are assembled into 

"template" or "consensus" sequences as follows. Sequence chromatograms are processed, verified, 
and quality scores are obtained using PHRED. Raw sequences are edited using an editing pathway 
known as Block 1 (See, e.g., the LEFESEQ Assembled User Guide, Incyte Genomics, Palo Alto, CA). 
A senes of BLAST comparisons is performed and low-information segments and repetitive elements 

30 (e.g., dinucleotide repeats, Alu repeats, etc.) are replaced by "n's", or masked, to prevent spurious 
matches. Mitochondrial and ribosomal RNA sequences are also removed. The processed sequences 
are then loaded into a relational database management system (RDMS) which assigns edited 
sequences to existing templates, if available. When additional sequences are added into the RDMS, a 
process is initiated which modifies existing templates or creates new templates from works in 

35 progress (i.e., nonfinal assembled sequences) containing queued sequences or the sequences 
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themselves. After the new sequences have been assigned to templates, the templates can be merged 
into bins. If multiple templates exist m one bin. the bin can be split and the templates reannotated. 

Once gene bins have been generated based upon sequence alignments, bins are "clone joined" 
based upon clone information. Clone joining occurs when the 5' sequence of one clone is present in 
5 one bin and the 3' sequence from the same clone is present in a different bin, indicating that the two 
bins should be merged into a single bin. Only bms which share at least two different clones are 
merged. 

A resultant template sequence may contain either a partial or a full length open reading 
frame, or all or part of a genetic regulatory element. This variation is due in pan to the fact that the 

10 full length cDNAs of many genes are several hundred, and sometimes several thousand, bases in 
length. With current technology, cDNAs compnsmg the coding regions of large genes cannot be 
cloned because of vector limitations, incomplete reverse iranscnption of the mRNA. or incomplete 
"second strand" synthesis. Template sequences may be extended to include additional contiguous 
sequences derived from the parent RNA transcnpt using a variety of methods known to those of skill 

15 in the art. Extension may thus be used to achieve the full length coding sequence of a gene. 

Analysis of the cDNA Sequences 

The cDNA sequences are analyzed using a variety of programs and aigorithms which are well 
known in the art. (See, e.g., Ausubel, 1997, supra . Chapter 7.7; Meyers. R.A. (Ed.) (1995) Molecular 

20 Biologv and Biotechnology , Wiley VCH, New York NY, pp. 856-853: and Table 5.) These analyses 
comprise both reading frame determinations, e.g., based on tnplet codon periodicity for particular 
organisms (Fickett, J.W. (1982) Nucleic Acids Res. 10:5303-5318); analyses of potential start and 
stop codons: and homology searches. 

Computer programs known to those of skill in the art for performing computer-assisted 

25 searches for amino acid and nucleic acid sequence similarity, include, for example, Basic Local 

Alignment Search Tool (BLAST; Altschul. S.F. (1993) J. Mol. Evol. 36:290-300; Altschul. S.F. et al. 
(1990) J. Mol. Biol. 215:403-410). BLAST is especially useful in determining exact matches and 
comparing two sequence fragments of arbitrary but equal lengths, whose alignment is locally 
maximal and for which the alignment score meets or exceeds a threshold or cutoff score set by the 

30 user (Karlin, S. et al. (1988) Proc. Natl. Acad. Sci. USA 85:841-845). Using an appropnate search 
toot (e.g.. BLAST or HMM), GenBank, SwissProt. BLOCKS, PFAM and other databases may be 
searched for sequences containing regions of homology to a query mddt or MDDT of the present 
invention. 

Other approaches to the identification, assembly, storage, and display of nucleotide and 
35 polypeptide sequences are provided in "Relational Database for Storing Biomolecule Information," 
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U.S.S.N. 08/947.843. filed October 9. 1997; "Project-Based Full-Length Biomoiecuiar Sequence 
Database." U.S.S.N. 08/81 1.758. filed March 6. 1997; and "Relational Database and System for 
Storing Information Relating to Biomoiecuiar Sequences," U.S.S.N. 09/034.807, filed March 4. 1998. 
all of which are incorporated by reference herein in their entirety. 
5 Protein hierarchies can be assigned to the putative encoded polypeptide based on. e.g., motif, 

BLAST, or biological analysis. Methods for assigning these hierarchies are described, for example, 
in "Database System Employing Protein Function Hierarchies for Viewmg Biomoiecuiar Sequence 
Data." U.S.S.N. 08/812.290, filed March 6, 1997, incorporated herein by reference. 

10 Human Disease Detection and Treatment Molecule Sequences 

The mddt of the present invention may be used for a variety of diagnostic and therapeutic 
purposes. For example, an mddt may be used to diagnose a particular condition, disease, or disorder 
associated with disease detection and treatment molecules. Such conditions, diseases, and disorders 
include, but are not limited to. a cell proliferative disorder, such as actinic keratosis, anerioscierosis. 

15 atherosclerosis, bursitis, cirrhosis, hepatitis, mixed connective tissue disease (MCTD). myelofibrosis, 
paroxysmal nocturnal hemoglobinuria, polycythemia vera, psoriasis, primary thrombocythemia, and 
cancers including adenocarcinoma, leukemia, lymphoma, melanoma, myeloma, sarcoma, 
teratocarcinoma. and. in particular, a cancer of the adrenal gland, bladder, bone, bone marrow, brain, 
breast, cervix, gall bladder, ganglia, gastrointestinal tract, heart, kidney, liver, lung, muscle, ovary. 

2 0 pancreas, parathyroid, penis, prostate, salivary glands, skin, spleen, testis, thymus, thyroid, and 
uterus: and an autoimmune/inflammatory disorder, such as actinic keratosis, acquired 
immunodeficiency syndrome (AIDS). Addison's disease, adult respiratory distress syndrome, 
allergies, ankylosing spondylitis, amyloidosis, anemia, anerioscierosis. asthma, atherosclerosis, 
autoimmune hemolytic anemia, autoimmune thyroiditis, bronchitis, bursitis, cholecystitis, cirrhosis, 

25 contact dermatitis. Crohn's disease, atopic dermatitis, dermatomyositis, diabetes meilitus, 

emphysema, erythroblastosis fetalis, erythema nodosum, atrophic gastritis, glomerulonephritis, 
Goodpasture's syndrome, gout. Graves' disease, Hashimoto's thyroiditis, paroxysmal nocturnal 
hemoglobinuria, hepatitis, hypereosinophilia, irritable bowel syndrome, episodic lymphopenia with 
lymphocytotoxins. mixed connective tissue disease (MCTD). multiple sclerosis, myasthenia gravis, 

30 myocardial or pericardial inflammation, myelofibrosis, osteoarthritis, osteoporosis, pancreatitis, 
polycythemia vera, polymyositis, psoriasis, Reiter's syndrome, rheumatoid arthritis, scleroderma, 
Sjogren's syndrome, systemic anaphylaxis, systemic lupus erythematosus, systemic sclerosis, 
primary thrombocythemia, thrombocytopenic purpura, ulcerative colitis, uveitis. Werner syndrome, 
complications of cancer, hemodialysis, and extracorporeal circulation, trauma, and hematopoietic 

35 cancer including lymphoma, leukemia, and myeloma. The mddt can be used to detect the presence 
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of, or to quantify the amount of. an mddt-related poiynucleotide in a sample. This information is then 
compared to information obtained from appropriate reference samples, and a diagnosis is established. 
Alternatively, a poiynucleotide complementary to a given mddi can inhibit or inactivate a 
therapeutically relevant gene related to the mddt. 

Analvsis of mddt Expression Patterns 

The expression of mddt may be routinely assessed by hybridization-based methods to 
determine, for example, the tissue-specificity, disease-specificity, or developmental stage-specificitv 
of mddt expression. For example, the level of expression of mddt may be compared among different 

10 cell types or tissues, among diseased and normal cell types or tissues, among cell types or tissues at 
different developmental stages, or among cell types or tissues undergoing various treatments. This 
type of analysis is useful, for example, to assess the relative levels of mddt expression in fully or 
partially differentiated cells or tissues, to determine if changes in mddt expression levels are 
correlated with the development or progression of specific disease slates, and to assess the response 

15 of a cell or tissue to a specific therapy, for example, in pharmacological or toxicological studies. 
Methods for the analysis of mddt expression are based on hybridization and amplification 
technologies and include membrane-based procedures such as northern blot analysis, high-throughput 
procedures that utilize, for example, microarrays, and PCR-based procedures. 

2 0 Hybridization and Genetic Analvsis 

The mddt. their fragments, or complementary sequences, may be used to identify the presence 
of and/or to determine the degree of similarity between two (or more) nucleic acid sequences. The 
mddt may be hybridized to naturally occurring or recombinant nucleic acid sequences under 
appropnately selected temperatures and salt concentrations. Hybridization with a probe based on the 
25 nucleic acid sequence of at least one of the mddi allows for the detection of nucleic acid sequences, 
including genomic sequences, which are identical or related to the mddt of the Sequence Listing. 
Probes may be selected from non-conserved or unique regions of at least one of the polynucleotides 
of SEQ ID NO: 1-14 and tested for their ability to identify or amplify the target nucleic acid sequence 
using standard protocols. 

3 0 Polynucleotide sequences that are capable of hybridizing, in panicular, to those shown in 

SEQ ID NO: 1-14 and fragments thereof, can be identified using various conditions of stringency. 
(See, e.g., Wahl. G.M. and S.L. Berger (1987) Methods Enzymol. 152:399-407; Kimmel, A.R. (1987) 
Methods Enzymol. 152:507-51 1.) Hybridization conditions are discussed in "Definitions." 

A probe for use in Southern or northern hybridization may be derived from a fragment of an 
3 5 mddt sequence, or us complement, that is up to several hundred nucleotides in length and is either 
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Single-stranded or double-stranded. Such probes may be hybridized in solution to biological materials 
such as piasmids, bactenal, yeast, or human artificial chromosomes, cleared or sectioned tissues, or to 
artificial substrates containing mddt. Microarrays are particularly suitable for identifying the 
presence of and detecting the level of expression for multiple genes of interest by examining gene 
5 expression correlated with, e.g., various stages of development, treatment with a drug or compound, 
or disease progression. An array analogous to a dot or slot blot may be used to arrange and link 
polynucleotides to the surface of a substrate using one or more of the following: mechanical 
(vacuum), chemical, thermal, or UV bonding procedures. Such an array may contain any number of 
mddt and may be produced by hand or by using available devices, materials, and machines. 

10 Microarrays may be prepared, used, and analyzed using methods known in the art. (See, e.g.. 

Brennan, T.M. et al. (1995) U.S. Patent No. 5.474.796: Schena. M. et al. (1996) Proc. Natl. Acad. Sci. 
USA 93:10614-10619: Baldeschweiler et al. (1995) PCT application W095/251I16: Shalon. D. et al. 
(1995) PCT application WO95/35505: Heller. R.A. et al. (1997) Proc. Natl. Acad. Sci. USA 94:2150- 
2155: and Heller, M.J. et al. (1997) U.S. Patent No. 5.605,662.) 

15 Probes may be labeled by either PCR or enzymatic techniques using a variety of 

commercially available reponer molecules. For example, commercial kits are available for 
radioactive and chemiluminescent labeling (Amersham Pharmacia Biotech) and for alkaline 
phosphatase labeling (Life Technologies). Alternatively, mddt may be cloned into commercially 
available vectors for the production of RNA probes. Such probes may be transcribed in the presence 

2 0 of at least one labeled nucleotide (e.g., "P-ATP. Amersham Pharmacia Biotech). 

Additionally the polynucleotides of SEQ ID NO: 1-1 4 or suitable fragments thereof can be 
used to isolate full length cDNA sequences utilizing hybridization and/or amplification procedures 
well known in the art, e.g., cDNA library screening. PCR amplification, etc. The molecular cloning 
of such full length cDNA sequences may employ the method of cDNA library screening with probes 
25 using the hybridization, stringency, washing, and probing strategies described above and m Ausubel. 
supra. Chapters 3, 5. and 6. These procedures may also br -mployed with genomic libraries to isolate 
genomic sequences of mddt in order to analyze, e.g., regulatory elements. 

Genetic Mapping 

3 0 Gene identification and mapping are imponant in the investigation and treatment of almost all 

conditions, diseases, and disorders. Cancer, cardiovascular disease, Alzheimer's disease, arthritis, 
diabetes, and mental illnesses are of particular interest. Each of these conditions is more complex 
than the single gene defects of sickle cell anemia or cystic fibrosis, with select groups of genes being 
predictive of predisposition for a particular condition, disease, or disorder. For example, 
3 5 cardiovascular disease may result from malfunctioning receptor molecules that fail to clear 
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cholesterol from the bloodstream, and diabetes may result when a panicular individual's immune 
system is activated by an infection and attacks the insuhn-producmg cells of the pancreas. In some 
studies. Alzheimer's disease has been linked to a gene on chromosome 2 1 : other studies predict a 
different gene and location. Mapping of disease genes is a complex and reiterative process and 
generally proceeds from genetic linkage analysis to physical mapping. 

As a condition is noted among members of a family, a genetic linkage map traces pans of 
chromosomes that are inherited in the same pattern as the condition. Statistics link the inheritance of 
particular conditions to particular regions of chromosomes, as defined by RFLP or other markers. 
(See, for example. Lander, E. S. and Botstein, D. (1986) Proc. Natl. Acad. Sci. USA 83:7353-7357.) 
Occasionally, genetic markers and their locations are known from previous studies. More often, 
however, the markers are simply stretches of DNA that differ among individuals. Examples of 
genetic linkage maps can be found in various scientific journals or at the Online Mendelian 
Inheritance in Man (OMIM) Worid Wide Web site. 

In another embodiment of the invention, mddt sequences may be used to generate 
hybridization probes useful in chromosomal mapping of naturally occumng genomic sequences. 
Either coding or noncoding sequences of mddt may be used, and in some instances, noncoding 
sequences may be preferable over coding sequences. For example, conservation of an mddt coding 
sequence among members of a multi-gene family may potentially cause undesired cross hybridization 
during chromosomal mapping. The sequences may be mapped to a particular chromosome, to a 
specific region of a chromosome, or to artificial chromosome constructions, e.g., human artificial 
chromosomes (HACs), yeast artificial chromosomes (YACs), bacterial artificial chromosomes 
(BACs), bacterial PI constructions, or single chromosome cDNA libraries. (See. e.g., Hamngton. J.J. 
etal. (1997) Nat. Genet. 15:345-355: Price, CM. (1993) Blood Rev. 7:127-134; and Trask, B J. 
(1991) Trends Genet. 7:149-154.) 

Fluorescent in situ hybridization (FISH) may be correlated with other physical chromosome 
mapping techniques and genetic map data. (See, e.g., Meyers, supra , pp. 965-968.) Correlation 
between the location of mddt on a physical chromosomal map and a specific disorder, or a 
predisposition to a specific disorder, may help define the region of DNA associated with that 
disorder. The mddt sequences may also be used to detect polymorphisms that are genetically linked 
to the inheritance of a panicular condition, disease, or disorder. 

^" situ hybridization of chromosomal preparations and genetic mapping techniques, such as 
linkage analysis using established chromosomal markers, may be used for extending existing genetic 
maps. Often the placement of a gene on the chromosome of another mammalian species, such as 
mouse, may reveal associated markers even if the number or arm of the corresponding human 
chromosome is not known. These new maricer sequences can be mapped to human chromosomes and 
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may provide valuable information to investigators searching for disease genes using positional 
cloning or other gene discovery techniques. Once a disease or syndrome has been crudely correlated 
by genetic linkage with a particular genomic region, e.g.. ataxia-telangiectasia to 1 lq22-23. any 
sequences mapping to that area may represent associated or regulatory genes for further investigation. 
5 (See. e.g., Gatti. R.A. et al. (1988) Nature 336:577-580.) The nucleotide sequences of the subject 
invention may also be used to detect differences in chromosomal architecture due to translocation, 
inversion, etc., among normal, carrier, or affected individuals. 

Once a disease-associated gene is mapped to a chromosomal region, the gene must be cloned 
in order to identify mutations or other alterations {e.g., translocations or mversions) that may be 

10 correlated with disease. This process requires a physical map of the chromosomal region containing 
the disease-gene of interest along with associated markers. A physical map is necessar>' for 
determining the nucleotide sequence of and order of marker genes on a particular chromosomal 
region. Physical mapping techniques are well known in the art and require the generation of 
overlapping sets of cloned DNA fragments from a panicuiar organelle, chromosome, or genome. 

15 These clones are analyzed to reconstruct and catalog their order. Once the position of a marker is 

determined, the DNA from that region is obtained by consulting the catalog and selecting clones from 
that region. The gene of interest is located through positional cloning techniques using hybridization 
or similar methods. 

20 Diagnostic Uses 

The mddt of the present invention may be u.sed to design probes useful in diagnostic assays. 
Such assays, well known to those skilled m the art, may be used to detect or confirm conditions, 
disorders, or diseases associated with abnormal levels of mddt expression. Labeled probes developed 
from mddt sequences are added to a sample under hybridizing conditions of desired stringency. In 

2 5 some instances, mddt, or fragments or oligonucleotides derived from mddt, may be used as primers in 

amplification steps prior to hybridization. The amount of hybridization complex formed is quantified 
and compared with standards for that cell or tissue. If mddt expression varies significantly from the 
standard, the assay indicates the presence of the condition, disorder, or disease. Qualitative or 
quantitative diagnostic methods may include northern, dot blot, or other membrane or dip-stick based 

3 0 technologies or multiple-sample format technologies such as PCR, enzyme-linked immunosorbent 

assay (ELISA)-like, pin, or chip-based assays. 

The probes described above may also be used to monitor the progress of conditions, 
disorders, or diseases associated with abnormal levels of mddt expression, or to evaluate the efficacy 
of a particular therapeutic treatment. The candidate probe may be identified from the mddt that are 
35 specific to a given human tissue and have not been observed in GenBank or other genome databases. 
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Such a probe may be used in animal studies, prechnical tests, clinical tnais. or in monitoring the 
treatment of an mdividual patient. In a typical process, standard expression is established by methods 
well known in the art for use as a basis of comparison, samples from patients affected by the disorder 
or disease are combined with the probe to evaluate any deviation from the standard profile, and a 
5 therapeutic agent is administered and effects are monitored to generate a treatment profile. Efficacy 
is evaluated by determining whether the expression progresses toward or returns to the standard 
normal pattern. Treatment profiles may be generated over a period of several days or several months. 
Statistical methods well known to those skilled in the art may be use to determine the significance of 
such therapeutic agents. 

10 The polynucleotides are also useful for identifying individuals from minute biological 

samples, for example, by matching the RFLP pattern of a sample's DNA to that of an individual's 
DNA. The polynucleotides of the present invention can also be used to determine the actual 
base-by-base DNA sequence of selected portions of an individual's genome. These sequences can be 
used to prepare PGR pnmers for amplifying and isolating such selected DNA, which can then be 

15 sequenced. Using this technique, an individual can be identified through a unique set of DNA 

sequences. Once a unique ID database is established for an individual, positive identification of that 
mdividual can be made from extremely small tissue samples. 

In a particular aspect, oligonucleotide primers derived from the mddt of the invention may be 
used to detect single nucleotide polymorphisms (SNPs). SNPs are substitutions, insenions and 

2 0 deletions that are a frequent cause of inherited or acquired genetic disease m humans. Methods of 

SNP detection include, but are not limited to. smgle-stranded conformation polymorphism (SSCP) 
and fluorescent SSCP (fSSCP) methods. In SSCP. oligonucleotide primers derived from the 
polynucleotide sequences encoding MDDT are used to amplify DNA using the polymerase chain 
reaction (PGR). The DNA may be derived, for example, from diseased or normal tissue, biopsy 
25 samples, bodily fluids, and the like. SNPs in the DNA cause differences in the secondary and ternary 
structures of PGR products in single-stranded form, and these differences are detectable using gel 
electrophoresis in non-denaturing gels. In fSCGP, the oligonucleotide primers are fluorescentiy 
labeled, which allows detection of the amplimers in high-throughput equipment such as DNA 
sequencing machines. Additionally, sequence database analysis methods, termed in silico SNP 

3 0 (isSNP), are capable of identifying polymorphisms by comparing the sequence of individual 

overlapping DNA fragments which assemble into common consensus sequences. These computer- 
based methods filter out sequence variations due to laboratory preparation of DNA and sequencing 
errors using statistical models and automated analyses of DNA sequence chromaiograms. In the 
alternative, SNPs may be detected and characterized by mass spectrometry using, for example, the 
3 5 high throughput MASSARRAY system (Sequenom, Inc., San Diego CA). 
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DNA-based identification techniques are critical in forensic technology. DNA sequences 
taken from very small biological samples such as tissues, e.g.. hair or skin, or body fluids, e.g.. blood, 
saliva, semen, etc.. can be amplified using, e.g.. PGR, to identify individuals. (See. e.g.. Eriich. H. 
(1992) PGR Technology. Freeman and Co., New York, NY). Similarly, polynucleotides of the 
present invention can be used as polymorphic markers. 

There is also a need for reagents capable of identifying the source of a particular tissue. 
Appropriate reagents can comprise, for example. DNA probes or primers prepared from the 
sequences of the present invention that are specific for panicular tissues. Panels of such reagents can 
identify tissue by species and/or by organ type. In a similar fashion, these reagents can be used to 
screen tissue cultures for contamination. 

The polynucleotides of the present invention can also be used as molecular weigh: markers on 
nucleic acid gels or Southern blots, as diagnostic probes for the presence of a specific mRNA m a 
particular cell type, in the creation of subtracted cDNA libraries which aid in the discovery of novel 
polynucleotides, in selection and synthesis of oligomers for attachment to an array or other support, 
and as an antigen to elicit an immune response. 

Disease Model Sysiem.-; Using mddt 

The mddt of the invention or their mammalian homologs may be "knocked out" in an animal 
model system using homologous recombination in embryonic stem (ES) cells. Such techniques are 
well known in the an and are useful for the generation of animal models of human disease. (See. e.g., 
U.S. Patent Number 5.175,383 and U.S. Patent Number 5.767,337.) For example, mouse ES cells, 
such as the mouse 129/SvJ cell line, are derived from the early mouse embryo and grown m culture. 
The ES cells are transformed with a vector containing the gene of interest disrupted by a marker gene, 
e.g.. the neomycin phosphotransferase gene (neo: Capecchi. M.R. (1989) Science 244:1288-1292). 
The vector integrates into the corresponding region of the host genome by homologous 
recombination. Alternatively, homologous recombination takes place using the Cre-loxP system to 
knockout a gene of interest in a tissue- or developmental stage-specific manner (Marth, J.D. ( 1 996) 
Clin. Invest. 97:1999-2002: Wagner, K.U. et al. (1997) Nucleic Acids Res. 25:4323-4330). 
Transformed ES cells are identified and microinjected into mouse cell blastocysts such as those from 
the G57BL/6 mouse strain. The blastocysts are surgically transferred to pseudopregnant dams, and 
the resulting chimeric progeny are genotyped and bred to produce heterozygous or homozygous 
strains. Transgenic animals thus generated may be tested with potential therapeutic or toxic agents. 

The mddt of the invention may also be manipulated in vitro in ES cells denved from human 
blastocysts. Human ES cells have the potential to differentiate into at least eight separate cell 
lineages including endoderm, mesoderm, and ectodermal cell types. These cell lineages differentiate 



wo 00n5298 



PCT/USOO/15344 



into, for example, neural cells, hematopoietic lineages, and cardiomyocytes (Thomson. J. .A., et al. 
(1998) Science 282;1]45-1 147). 

The mddt of the invention can also be used to create "knockin" humanized animals (pigs) or 
transgenic animals (mice or rats) to model human disease. With knockin technology, a region of 
5 mddt is injected into animal ES cells, and the injected sequence integrates into the animal cell 

genome. Transformed cells are injected into biastulae. and the blasiulae are implanted as descnbed 
above. Transgenic progeny or inbred lines are studied and treated with potential pharmaceutical 
agents to obtain information on treatment of a human disease. Alternatively, a mammal inbred to 
overexpress mddt, resulting, e.g., in the secretion of MDDT in its milk, may also serve as a 
10 convenient source of that protein (Janne. J. et al. (1998) Biotechnol. Annu. Rev. 4:55-74). 

Screening Assavs 

MDDT encoded by polynucleotides of the present invention may be used to screen for 
molecules that bind to or are bound by the encoded polypeptides. The binding of the polypeptide and 

15 the molecule may activate (agonist), increase, inhibit (antagonist), or decrease activity of the 
polypeptide or the bound molecule. Examples of such molecules include antibodies, 
oligonucleotides, proteins (e.g., receptors), or small molecules. 

Preferably, the molecule is closely related to the natural ligand of the polypeptide, e.g., a 
ligand or fragment thereof, a natural substrate, or a structural or functional mimetic. (See, Coligan et 

20 al., (1991) Current Protocols in Immunologv 1(2): Chapter 5.) Similarly, the molecule can be closely 
related to the natural receptor to which the polypeptide binds, or to at least a fragment of the receptor, 
e.g., the active site. In either case, the molecule can be rationally designed using known techniques. 
Preferably, the screening for these molecules involves producing appropnate cells which express the 
polypeptide, either as a secreted protein or on the cell membrane. Preferred cells include cells from 

25 mammals, yeast. Dro.sophila , or E. coli . Cells expressing the polypeptide or ceil membrane fractions 
which contain the expressed polypeptide are then contacted with a test compound and binding, 
stimulation, or inhibition of activity of either the polypeptide or the molecule is analyzed. 

An assay may simply test binding of a candidate compound to the polypeptide, wherein 
binding is detected by a fluorophore. radioisotope, enzyme conjugate, or other detectable label. 

3 0 Alternatively, the assay may assess binding in the presence of a labeled competitor. 

Additionally, the assay can be carried out using cell-free preparations, polypeptide/moiecule 
affixed to a solid support, chemical libraries, or natural product mixtures. The assay may also simply 
comprise the steps of mixing a candidate compound with a solution containing a polypeptide, 
measuring polypeptide/moiecule activity or binding, and comparing the polypeptide/moiecule activity 

35 or binding to a standard. 
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Preferably, an ELISA assay using, e.g.. a monoclonal or poiycional antibody, can measure 
polypeptide level in a sample. The antibody can measure polypeptide level by either binding, directly 
or indirectly, to the polypeptide or by competing with the polypeptide for a substrate. 

All of the above assays can be used in a diagnostic or prognostic context. The molecules 
5 discovered using these assays can be used to treat disease or to bnng about a particular result in a 
patient (e.g., blood vessel growth) by activating or inhibiting the polypeptide/molecule. Moreover, the 
assays can discover agents which may inhibit or enhance the production of the polypeptide from 
suitably manipulated cells or tissues. 

10 Transcript Imaging 

Another embodiment relates to the use of mddt to develop a transcript image of a tissue or 
cell type. A transcript image is the collective pattern of gene expression by a particular tissue or cell 
type under given conditions and at a given time. This pattern of gene expression is defined by the 
number of expressed genes, their abundance, and their function. Thus the mddt of the present 

IB invention may be used to develop a transcript image of a tissue or cell type by hybridizing, preferably 
in a microarray format, the mddt of the present invention to the totality of transcripts or reverse 
transcripts of a tissue or cell type. The resultant transcript image would provide a profile of gene 
activity penaining to disease detection and treatment. 

Transcript images which profile mddt expression may be generated using transcnpis isolated 

20 from tissues, cell lines, biopsies, or other biological samples. The transcript image may thus reflect 
mddt expression in vivo , as in the case of a tissue or biopsy sample, or in vitro , as in the case of a cell 
iine. Transcript images may be used to profile mddt expression in distinct tissue types. This process 
can be used to determine disease detection and treatment molecule activity in a particular tissue type 
relative to this activity in a different tissue type. Transcript images may be used to generate a profile 

25 of mddt expression characteristic of diseased tissue. Transcript images of tissues before and after 
treatment may be used for diagnostic purposes, to monitor the progression of di.sease. and to monitor 
the efficacy of drug treatments for diseases which affect the activity of disease detection and 
treatment molecules. 

Transcript images which profile mddt expression may also be used in conjunction with in 
3 0 vitro model systems and preclinical evaluation of pharmaceuticals. Transcript images of cell lines 
can be used to assess disease detection and treatment molecule activity and/or to identify cell lines 
that lack or misregulate this activity. Such cell lines may then be treated with pharmaceutical agents, 
and a transcript image following treatment may indicate the efficacy of these agents in restoring 
desired levels of this activity. A similar approach may be used to assess the toxicity of 
3 5 pharmaceutical agents as reflected by undesirable changes in disease detection and treatment 
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molecule activity. Candidate pharmaceutical agents may be evaluated by companna their associated 
transcript images with those of pharmaceutical agents of known effectiveness. 

Antisense Molecules 

5 The polynucleotides of the present invention are useful in antisense technology. Antisense 

technology or therapy relies on the modulation of expression of a target protem through the specific 
binding of an antisense sequence to a target sequence encoding the target protein or directing its 
expression. (See, e.g., Agrawal, S., ed. (1996) Antisense Therapeutics . Humana Press Inc.. Totawa 
NJ; Alama, A. et al. (1997) Pharmacol. Res. 36(3):171-178; Crooke. S.T. (1997) Adv. Pharmacol. 

10 40:1-49; Sharma, H.W. and R. Narayanan (1 995) Bioessays 17(12): 1055-1063; and Lavrosky. Y. et 
al. (1997) Biochem. Mol. Med. 62(1 ):1 1-22.) An antisense sequence is a polynucleotide sequence 
capable of specifically hybridizing to at least a ponion of the target sequence. Antisense sequences 
bind to cellular mRNA and/or genomic DNA, affecting translation and/or transcription. Antisense 
sequences can be DNA, RNA, or nucleic acid mimics and analogs. (See, e.g., Rossi, J.J. et al. (1991) 

15 Antisense Res. Dev. l(3):285-288; Lee. R. et al. (1998) Biochemistry 37(3):900-1010; Pardridge, 
W.M. et al. (1995) Proc. Natl. Acad. Sci. USA 92(12):5592-5596; and Nielsen. P. E. and Haaima. G. 
(1997) Chem. Soc. Rev. 96:73-78.) Typically, the binding which results in modulation of expression 
occurs through hybridization or binding of complementary base pairs. Antisense sequences can also 
bind to DNA duplexes through specific interactions in the major groove of the double helix. 

20 The polynucleotides of the present invention and fragments thereof can be used as antisense 

sequences to modify the expression of the polypeptide encoded by mddt. The antisense sequences 
can be produced ex vivo , such as by using any of the ABI nucleic acid synthesizer series (PE 
Biosystems) or other automated systems known in the art. Antisense sequences can also be produced 
biologically, such as by transforming an appropriate host cell with an expression vector containing 

25 the sequence of interest. (See, e.g., Agrawal. supra .) 

In therapeutic use, any gene delivery system suitable for introduction of the antisense 
sequences into appropriate target cells can be used. Antisense sequences can be delivered 
intracellularly in the form of an expression plasmid which, upon transcription, produces a sequence 
complementary to at least a ponion of the cellular sequence encoding the target protein. (See, e.g., 

30 Slater, J.E., et al. (1998) J. Allergy Clin. Immunol. 102(3):469-475; and Scanion. K.J., et al. (1995) 
9(1 3): 1288- 1296.) Antisense sequences can also be introduced intracellularly through the use of viral 
vectors, such as retrovirus and adeno-associated virus vectors. (See, e.g.. Miller, A.D. (1990) Blood 
76:27 1 ; Ausubel. P.M. et al. ( 1 995) Current Protocols in Molecular Biology . John Wiley Sons, 
New York NY; Uckert, W. and W. Walther ( 1 994) Pharmacol. Ther. 63(3):323-347.) Other gene 

35 delivery mechanisms include liposome-derived systems, artificial viral envelopes, and other systems 
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known m the art. (See. e.g., Rossi. J.J. (1995) Br. Med. Bull. 5](l):2I7-225; Boado. R.J. ei al. (1998) 
J. Pharm. Sci. 87(1 1); 1308-1315; and Morris, M.C. et al. (1997) Nucleic Acids Res. 25(14)-.2730- 
2736.) 

Expression 

In order to express a biologically active MDDT, the nucleotide sequences encoding MDDT or 
fragments thereof may be msened into an appropriate expression vector, i.e.. a vector which contams 
the necessary elements for iranscnptional and translational control of the inserted coding sequence in 
a suitable host. Methods which are well known to those skilled in the an may be used to construct 
expression vectors containing sequences encoding MDDT and appropriate transcnpiional and 
translational control elements. These methods include in vitro recombinant DNA techniques, 
synthetic techniques, and in vivo genetic recombination. (See. e.g., Sambrook, supra . Chapters 4, 8. 
16, and 17; and Ausubel. supra . Chapters 9, 10, 13. and 16.) 

A variety of expression vector/host systems may be utilized to contain and express sequences 
encoding MDDT. These include, but are not limited to, microorganisms such as bacteria transformed 
with recombinant bacteriophage, plasmid, or cosmid DNA expression vectors; yeast transformed with 
yeast expression vectors; insect cell systems infected with viral expression vectors (e.g., baculovirus); 
plant cell systems transformed with viral expression vectors (e.g.. cauliflower mosaic virus, CaMV, 
or tobacco mosaic virus, TMV) or with bacteria} expression vectors (e.g., Ti or pBR322 plasmids); or 
animal (mammalian) cell systems. (See. e.g., Sambrook. supra : Ausubel. 1995. supra . Van Heeke. G. 
and S.M. Schuster {1989} J. Biol. Chem. 264:5503-5509; Bitter. G.A. et al. ( 1 987) Methods Enzymol. 
153:516-544; Scorer, C.A. et al. (1994) Bio/Technology 12:181-184; Engelhard, E.K. et al. (1994) 
Proc. Natl. Acad. Sci. USA 91:3224-3227; Sandig, V. et al. (1996) Hum. Gene Ther. 7:1937-1945; 
Takamaisu, N. (1987) EMBO J. 6:307-31 1; Coruzzi. G. et al. (1984) EMBO J. 3:1671-1680; Broglie, 
R. et al. (1984) Science 224:838-843; Winter, J. et al. (1991) Results Probl. Cell Differ. 17:85-105; 
The McGraw Hill Yearb ook of Science and Technologv (1992^ McGraw Hill, New York NY, pp. 
191-196; Logan, J. and T. Shenk (1984) Proc. Natl. Acad. Sci. USA 81:3655-3659; and Hamngton, 
J.J. et ai. (1997) Nat. Genet. 15:345-355.) Expression vectors derived from retroviruses, 
adenoviruses, or herpes or vaccinia viruses, or from various bacterial plasmids. may be used for 
delivery of nucleotide sequences to the targeted organ, tissue, or ceil population. (See, e.g., Di 
Nicola, M. et al. (1998) Cancer Gen. Ther. 5(6):350-356; Yu. M. et al., (1993) Proc. Natl. Acad. Sci. 
USA 90(13):6340-6344; Bulier, R.M. et al. (1985) Nature 317(6040):813-815; McGregor. D.P. et al. 
(1994) Mol. Immunol. 31(3):219-226; and Verma, I.M. and N. Somia (1997) Nature 389:239-242.) 
The invention is not limited by the host cell employed. 
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For long term production of recomfainant proteins in mammahan systems, stable expression 
of MDDT in cell lines is preferred. For example, sequences encoding MDDT can be transformed into 
cell lines using expression vectors which may contain viral origins of replication and/or endogenous 
expression elements and a selectable marker gene on the same or on a separate vector. Any number 
5 of selection systems may be used to recover transformed cell lines. (See. e.g.. W'igler. M. et al. 
(1977) Cell 1 1:223-232; Lowy, I. et al. ( 1980) Cell 22:817-823.; Wigier, M. et al. (1980) Proc. Natl. 
Acad. Sci. USA 77:3567-3570; Colbere-Garapin, F. et al. (1981) J. Mol. Biol. 150:1-14: Hanman. 
S.C. and R.CMulligan (1988) Proc. Natl. Acad. Sci. USA 85:8047-8051: Rhodes. C.A. (1995) 
Methods Mol. Biol. 55:121-131.) 

10 

Therapeutic Uses of mddt 

The mddt of the invention may be used for somatic or germline gene therapy. Gene therapy 
may be performed to (i) correct a genetic deficiency (e.g., in the cases of severe combined 
immunodeficiency (SCID)-Xl disease characterized by X-linked inheritance (Cavazzana-Calvo. M. et 

15 al. (2000) Science 288:669-672), severe combined immunodeficiency syndrome associated with an 
inherited adenosine deaminase (ADA) deficiency (Blaese, R.M. et al. ( 1995) Science 270:475-480; 
Bordignon, C. et al. (1995) Science 270:470-475), cystic fibrosis (Zabner. J. et al. (1993) Cell 75:207- 
216; Crystal, R.G. et al. (1995) Hum. Gene Therapy 6:643-666; Crystal. R.G. et al. (1995) Hum. Gene 
Therapy 6:667-703), thalassemias, familial hypercholesterolemia, and hemophilia resulting from 

2 0 Factor Vm or Factor DC deficiencies (Crystal. R.G. ( 1 995) Science 270:404-4 10: Verma. I.M. and 
Somia. N. (1997) Nature 389:239-242)), (ii) express a conditionally lethal gene product (e.g.. in the 
case of cancers which result from unregulated ceil proliferation), or (iii) express a protein which 
affords protection against intracellular parasites (e.g., against human retroviruses, such as human 
immunodeficiency virus (HIV) (Baltimore, D. (1988) Nature 335:395-396; Poeschla, E. et al. (1996) 

2 5 Proc. Natl. Acad. Sci. USA. 93:1 1395-1 1399), hepatitis B or C virus (HBV. HCV); fungal parasites, 
such as Candida albicans and Paracoccidioides brasiliensis ; and protozoan parasites such as 
Plasmodiu m falciparum and Trypanosoma cruzi) . In the case where a genetic deficiency in mddt 
expression or regulation causes disease, the expression of mddt from an appropnate population of 
transduced cells may alleviate the clinical manifestations caused by the genetic deficiency. 

30 In a further embodiment of the invention, diseases or disorders caused by deficiencies in 

mddt are treated by constructing mammalian expression vectors comprising mddt and introducing 
these vectors by mechanical means into mddt-deficient cells. Mechanical transfer technologies for 
use with cells in vivo or ex vitro include (i) direct DNA microinjection into individual cells, (ii) 
ballistic gold particle delivery, (iii) liposome-mediated transfection, (iv) receptor-mediated gene 

35 transfer, and (v) the use of DNA transposons (Morgan, R.A. and Anderson, W.F. ( 1 993) Annu. Rev. 
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Biochem. 62: 191 -217; Ivies. Z. (1997) Cell 91:501-510; Boulav. J-L. and Recipon. H. (1998) Curr. 
Opin. Biotechnoi. 9:445-450). 

Expression vectors that may be effective for the expression of mddt include, but are not 
limited to, the PCDNA 3.1, EPITAG, PRCCMV2, PREP, PVAX vectors (Invitrogen. Carlsbad CA). 
PCMV-SCRIPT, PCMV-TAG, PEGSH/PERV (Straiagene. La Jolla CA), and PTET-OFF. 
PTET-ON, PTRE2, PTRE2-LUC, PTK-HYG fCIontech, Paio Alto CA). The mddt of the jn%'ennon 
may be expressed using (i) a constitutively active promoter, (e.g., from cytomegalovirus (CMV), 
Rous sarcoma virus (RSV). SV40 vims, thymidine kinase (TK). or p-aciin genes), (ii) an inducible 
promoter {e.g.. the tetracycline-regulated promoter (Gossen. M. and Bujard. H. (1992) Proc. Natl. 
Acad. Sci. U.S.A. 89:5547-5551; Gossen, M. et al., (1995) Science 268:1766-1769; Rossi. F.M.V. 
and Blau, H.M. (1998) Curr. Opin. Biotechnoi. 9:451-456), commercially available in the T-REX 
plasmid (Invitrogen)); the ecdysone-inducibie promoter (available in the piasmids PVGRXR and 
PIND; Invitrogen): the FK506/rapamycin inducible promoter: or the RU486/mifepnstone inducible 
promoter (Rossi. F.M.V. and Blau, H.M. supra )), or (iii) a tissue-specific promoter or the native 
promoter of the endogenous gene encoding MDDT from a normal individual. 

Commercially available liposome transformation kits (e.g.. the PERFECT LIPID 
TRANSFECTION KIT, available from Invitrogen) allow one with ordinar>' skill m the art to deliver 
polynucleotides to target cells in culture and require minimal effort to optimize expenmental 
parameters. In the alternative, transformation is performed using the calcium phosphate method 
(Graham. F.L. and Eb. A.J. (1973) Virology 52:456-467). or by electroporaiion (Neumann, E. et al. 
( 1 982) EMBO J . 1 :84 1 -845). The introduction of DNA to primary ceils requires modification of 
these standardized mammalian transfection protocols. 

In another embodiment of the invention, diseases or disorders caused by genetic defects with 
respect to mddt expression are treated by constructing a retrovirus vector consisting of (i) the mddt of 
the invention under the control of an independent promoter or the retrovirus long terminal repeat 
(LTR) promoter, (ii) appropriate RNA packaging signals, and (iii) a Rev-responsive element (RRE) 
along with additional retrovirus c/j-actmg RNA sequences and coding sequences required for 
efficient vector propagation. Retrovirus vectors (e.g., PFB and PFBNEO) are commercially available 
(Stratagene) and are based on published data (Riviere, I. et al. (1995) Proc. Natl. Acad. Sci. U.S.A. 
92:6733-6737), incorporated by reference herein. The vector is propagated m an appropriate vector 
producing cell line (VPCL) that expresses an envelope gene with a tropism for receptors on the target 
cells or a promiscuous envelope protein such as VSVg (Armentano, D. et al. (1987) J. Virol. 61:1647- 
1650; Bender, M.A. et al. (1987) J. Virol. 61:1639-1646; Adam, M.A. and Miller, A.D. (1988) J. 
Virol. 62:3802-3806: Dull, T. et al. (1998) J /irol. 72:8463-8471 ; Zufferey, R. et al. (1998) J. Virol. 
72:9873-9880). U.S. Patent Number 5,910.434 to Rigg ("Method for obtainmg retrovirus packaging 
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cell lines producing high transducing efficiency retroviral supernatant") discloses a method for 
obtaining retrovirus packaging cell lines and is hereby incorporated by reference. Propagation of 
retrovirus vectors, transduction of a population of cells (e.g., CD4^ T-cells). and the return of 
transduced cells to a patient are procedures well known to persons skilled m the art of gene therapy 
5 and have been well documented (Ranga, U. et al. (1997) J. Virol. 71:7020-7029; Bauer. G. et al. 
(1997) Blood 89:2259-2267; Bonyhadi, M.L. (1997) J. Virol. 71:4707-4716; Ranga. U. et al. (1998) 
Proc. Natl. Acad. Sci. U.S.A. 95:1201-1206; Su. L. (1997) Blood 89:2283-2290). 

In the alternative, an adenovirus-based gene therapy delivery system is used to deliver mddt 
to cells which have one or more genetic abnormalities with respect to the expression of mddt. The 

10 construction and packaging of adenovirus-based vectors are well known to those with ordinary skill in 
the art. Replication defective adenovirus vectors have proven to be versatile for imponmg genes 
encoding immunoregulatory proteins into intact islets in the pancreas (Csete. M.E. et al. ( 1 995) 
Transplantation 27:263-268). Potentially useful adenoviral vectors are descnbed in U.S. Patent 
Number 5.707,618 to Armentano ("Adenovirus vectors for gene therapy"), hereby mcorporated by 

15 reference. For adenoviral vectors, see also Antinozzi, P.A. et al. (1999) Annu. Rev. Nutr. 19:51 1-544 
and Verma, I.M. and Somia, N. (1997) Nature 18:389:239-242, both incorporated by reference herein. 

In another alternative, a herpes-based, gene therapy delivery system is used to deliver mddt to 
target cells which have one or more genetic abnormalities with respect to the expression of mddt. 
The use of herpes simplex virus (HSV)-based vectors may be especially valuable for introducing 

20 mddt to cells of the central nervous system, for which HSV has a tropism. The construction and 
packaging of herpes-based vectors are well known to those with ordinary skill in the an. A 
replication-competent herpes simplex virus (HSV) type 1 -based vector has been used to deliver a 
reporter gene to the eyes of primates (Liu. X. et al. ( 1 999) Exp. Eye Res. i 69:385-395). The 
construction of a HSV-1 virus vector has also been disclosed in detail in U.S. Patent Number 

25 5,804.413 to DeLuca ("Herpes simplex virus strains for gene transfer"), which is hereby incorporated 
by reference. U.S. Patent Number 5,804.413 teaches the vse of recombinant HSV d92 which consists 
of a genome containing at least one exogenous gene to be transferred to a cell under the control of the 
appropriate promoter for purposes including human gene therapy. Also taught by this patent are the 
construction and use of recombinant HSV strains deleted for ICP4, ICP27 and ICP22. For HSV 

3 0 vectors, see also Coins, W. F. et al. 1999 J. Viroi. 73:519-532 and Xu, H. et al., (1994) Dev. Biol. 
163:152-161, hereby incorporated by reference. The manipulation of cloned herpesvirus sequences, 
the generation of recombinant virus following the transfection of multiple plasmids containing 
different segments of the large herpesvirus genomes, the growth and propagation of herpesvirus, and 
the infection of cells with herpesvirus are techniques well known to those of ordinary skill in the art. 
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In another alternative, an alphavirus (.positive, single-stranded RNA vims) vector is used to 
deliver mddt to target cells. The biology of the protorypic alphavirus. Semliki Forest Virus (.SFV^. 
has been studied extensively and gene transfer vectors have been based on the SFV genome (Garoff. 
H. and Li. K-J. (1998) Curr. Opin. Biotech. 9:464-469). During alphavirus RNA replication, a 
5 subgenomic RNA is generated that norrnally encodes the viral capsid proteins. This subgenomic 
RNA replicates lo higher levels than the full-length genomic RNA, resulting in the overproduction of 
capsid proteins relative to the viral proteins with enzymatic activity (e.g., protease and polymerase). 
Similarly, inserting mddt into the alphavirus genome in place of the capsid-coding region results in 
the production of a large number of mddt RNAs and the synthesis of high levels of MDDT in vector 

10 transduced cells. While alphavirus infection is typically associated with cell lysis within a few days, 
the ability to establish a persistent infection in hamster normal kidney cells (BHK-2 1 ) with a variant 
of Sindbis vims (SIN) indicates that the lytic replication of alphaviruses can be altered to suit the 
needs of the gene therapy application (Dryga. S.A. et al. (1997) Virology 228:74-83). The wide host 
range of alphaviruses will allow the introduction of MDDT into a variety of cell types. The specific 

15 transduction of a subset of cells in a population may require the sorting of cells prior to transduction. 
The methods of manipulating infectious cDNA clones of alphaviruses, performing alphavirus cDNA 
and RNA transfections. and performing alphavirus infections, are well known to those with ordinary 
skill in the art. 

20 Antibodies 

Anti-MDDT antibodies may be used to analyze protein expression levels. Such antibodies 
include, but are not limited to, polyclonal, monoclonal, chimeric, single chain, and Fab fragments. 
For descriptions of and protocols of antibody technologies, see, e.g.. Pound J.D. (1998) 
Immunoc hemical Protocnk . Humana Press. Totowa. NJ. 

25 The amino acid sequence encoded by the mddt of the Sequence Listing may be analyzed by 

appropnate software {e.g., LASERGENE NAVIGATOR software, DNASTAR) to determine regions 
of high immunogenicity. The optimal sequences for immunization are selected from the C-terminus, 
the N-terminus, and those mtervening, hydrophilic regions of the polypeptide which are likely to be 
exposed to the external environment when the polypeptide is in its natural conformation. Analysis 

30 used to select appropriate epitopes is also described by Ausubel ( 1 997, supra . Chapter 1 1 .7). Peptides 
used for antibody induction do not need to have biological activity; however, they must be antigenic. 
Peptides used to induce specific antibodies may have an amino acid sequence consisting of at five 
amino acids, preferably at least 10 amino acids, and most preferably 15 amino acids. A peptide which 
mimics an antigenic fragment of the natural polypeptide may be fused with another protein such as 

35 keyhole limpet cyanin (KLH: Sigma. St. Louis MO) for antibody production. A peptide 
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encompassing an antigenic region may be expressed from an mddt. synthesized as descnbed above, or 
purified from human cells. 

Procedures well known in the art may be used for the production of antibodies. Various hosts 
including mice, goats, and rabbits, may be immunized by injection with a peptide. Depending on the 
5 host species, vanous adjuvants may be used to increase immunological response. 

In one procedure, peptides about 15 residues in length may be synthesized using an ABI 
431 A peptide synthesizer (PE Biosystems) using fmoc-chemistry and coupled to KLH (Sigma) by 
reaction with M-maleimidobenzoyl-N-hydroxysuccinimide ester (Ausubel. 1995, supra ). Rabbits are 
immunized with the peptide-KLH complex in complete Freund's adjuvant. The resulting antisera are 

10 tested for antipeptide activity by binding the peptide to plastic, blocking with 1 % bovine serum 
albumin (BSA), reacting with rabbit antisera. washing, and reacting with radioiodinated goat anti- 
rabbit IgG. Antisera with antipeptide activity are tested for anti-MDDT activity using protocols well 
known in the an, including ELISA, radioimmunoassay (RIA). and immunoblouing. 

In another procedure, isolated and purified peptide may be used to immunize mice (about 100 

15 ]is of peptide) or rabbits (about 1 mg of peptide). Subsequently, the peptide is radioiodinated and 
used to screen the immunized animals' B-lymphocytes for production of antipeptide antibodies. 
Positive cells are then used to produce hybridomas using standard techniques. About 20 mg of 
peptide is sufficient for labeling and screening several thousand clones. Hybridomas of interest are 
detected by screening with radioiodinated peptide to identify those fusions producing peptide-specific 

20 monoclonal antibody. In a typical protocol, wells of a multi-well plate (FAST. Becton-Dickinson, 
Palo .Alto, CA) are coated with affinity-purified, specific rabbit-anti-mouse (or suitable anti-species 
IgG) antibodies at 10 mg/ml. The coated wells are blocked with 1 % BSA and washed and exposed to 
supematants from hybridomas. After incubation, the wells are exposed to radiolabeled peptide at 1 
mg/ml. 

25 Clones producing antibodies bind a quantity of labeled peptide that is detectable above 

background. Such clones are expanded and subjected to 2 cycles of cloning. Cloned hybridomas are 
injected into pnsiane-treated mice to produce ascites, and monoclonal antibody is purified from the 
ascitic fluid by affinity chromatography on protein A {Amersham Pharmacia Biotech). Several 
procedures for the production of monoclonal antibodies, including in vitro production, are described 

30 in Pound ( supra) . Monoclonal antibodies with antipeptide activity are tested for anti-MDDT activity 
using protocols well known in the an, including ELISA, RIA, and immunobiotting. 

Antibody fragments containing specific binding sites for an epitope may also be generated. 
For example, such fragments include, but are not limited to, the F(ab')2 fragments produced by pepsin 
digestion of the antibody molecule, and the Fab fragments generated by reducing the disulfide bridges 

35 of the F(ab')2 fragments. Alternatively, construction of Fab expression libraries in filamentous 
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bactenophage allows rapid and easy identification of monoclonal fragments with desired specificuv 
(Pound, supra . Chaps. 45-47). Antibodies generated against polypeptide encoded by mddt can be used 
to purify and characterize full-length MDDT protein and its activity, binding panners. etc. 

Assays Using Antibodies 

Anii-MDDT antibodies may be used in assays to quantify the amount of MDDT found in a 
panicuiar human cell. Such assays include methods utilizing the antibody and a label to detect 
expression level under normal or disease conditions. The peptides and antibodies of the invention 
may be used with or without modification or labeled by joining them, either covalentiy or 
noncovalently, with a reponer molecule. 

Protocols for detecting and measuring protein expression using either polyclonal or 
monoclonal antibodies are well known in the art. Examples include ELISA, RL\, and fluorescent 
activated cell soning (FACS). Such immunoassays typically involve the formation of complexes 
between the MDDT and its specific antibody and the measurement of such complexes. These and 
other assays are descnbed in Pound ( supra ). 



Without funher elaboration, it is believed that one skilled m the art can. using the preceding 
description, utilize the present invention to its fullest extent. The following preferred specific 
embodiments are, therefore, to be construed as merely illustrative, and not limitative of the remainder 
of the disclosure in any way whatsoever. 

The disclosures of all patents, applications, and publications mentioned above and below, in 
particular U.S. Provisional Application No. 60/137,412, filed June 3, 1999, U.S. Provisional 
Application No. 60/147,542, filed August 5. 1999, U.S. Provisional Application No. 60/147,501, filed 
August 5, 1999, U.S. Provisional Application No. 60/147.500, filed August 5, 1999 are hereby 
expressly incorporated by reference. 

EXAMPLES 

I. Construction of cDNA Libraries 

RNA was purchased from CLONTECH Laboratories, Inc. (Palo Alto CA) or isolated from 
various tissues. Some tissues were homogenized and lysed in guanidinium isothiocyanate, while 
others were homogenized and lysed in phenol or in a suitable mixture of denaturants, such as 
TRIZOL (Life Technologies), a monophasic solution of phenol and guanidine isothiocyanate. The 
resulting lysates were centrifuged over CsCl cushions or extracted with chloroform. RNA was 
precipitated with either isopropanol or sodium acetate and ethanol, or by other routine methods. 

Phenol extraction and precipitation of RNA were repeated as necessary to increase RNA 
purity. In most cases, RNA was treated with DNase. For most libraries, poly^ A+) RNA was isolated 
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using oligo d(T)-coupled paramagnetic particles (Promega Corporation (Promega), Madison WI). 
OLIGOTEX latex particles (QIAGEN, Inc. (QIAGEN), Valencia CA). or an OLIGOTEX mRNA 
purification kit (QIAGEN). Alternatively. RNA was isolated directly from tissue lysates using other 
RNA isolation kits, e.g.. the POLY(A)PURE mRNA purification kit (Ambion. Inc.. Austin TX). 

In some cases. Stratagene was provided with RNA and constructed the corresponding cDNA 
libraries. Otherwise, cDNA was synthesized and cDNA libranes were constructed with the UNIZAP 
vector system (Stratagene Cloning Systems, Inc. (Stratagene), La Jolla CA) or SUPERSCRIPT 
plasmid system (Life Technologies), using the recommended procedures or similar methods known in 
the an. (See, e.g., Ausubel. 1997, supra . Chapters 5.1 through 6.6.) Reverse transcnption was 
initiated using oligo d(T) or random primers. Synthetic oligonucleotide adapters were ligated to 
double stranded cDNA, and the cDNA was digested with the appropriate restriction enzyme or 
enzymes. For most libraries, the cDNA was size-selected (300- 1 000 bp) using SEPHACRYL S 1 000. 
SEPHAROSE CL2B, or SEPHAROSE CL4B column chromatography (Amersham Pharmacia 
Biotech) or preparative agarose gel electrophoresis. cDNAs were ligated into compatible restriction 
enzyme sites of the polylinker of a suitable plasmid, e.g., PBLL^SCRIPT plasmid (Stratagene), 
pSPORTI plasmid (Life Technologies), orpINCY (Incyte). Recombinant plasmids were transformed 
into competent E- coli ceils including XLl-Blue. XLl-BlueMRF. or SOLR from Stratagene or DH5a, 
DHIOB, or ElectroMAX DHIOB from Life Technologies. 

II. Isolation of cDNA Clones 

Plasmids were recovered from host cells by in vivo excision using the UNIZAP vector sy.stem 
(Stratagene) or by cell lysis. Plasmids were purified using at least one of the following: the Magic or 
WIZARD Minipreps DNA purification system (Promega); the AGTC Miniprep purification kit (Edge 
BioSystems, Gaithersburg MD); and the QIAWELL 8, QIAWELL 8 Plus, and QIAWELL 8 Ultra 
plasmid purification systems or the R.E.A.L. PREP 96 plasmid purification kit (QIAGEN). 
Following precipitation, plasmids were resuspended in 0.1 ml of distilled water and stored, with or 
without lyophilization, at 4°C. 

Alternatively, plasmid DNA was amplified from host cell lysates using direct link PCR in a 
high-throughput format. (Rao, V.B. (1994) Anal. Biochem. 216:1-14.) Host cell lysis and thermal 
cycling steps were carried out in a single reaction mixture. Samples were processed and stored in 
384-well plates, and the concentration of amplified plasmid DNA was quantified fluorometrically 
using PICOGREEN dye (Molecular Probes, Inc. (Molecular Probes). Eugene OR) and a 
FLUOROSKAN II fluorescence scanner (Labsystems Oy, Helsinki, Finland). 
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III. Sequencing and Analysis 

cDNA sequencing reactions were processed using standard methods or high-throughput 
instrumentation sucii as the ABI CATALYST 800 thermal cycler (PE Biosysiems) or the PTC-200 
thermal cycler (MJ Research) in conjunction with the HYDRA nucrodispenser (Robbins Scientific 
5 Corp.. Sunnyvale CA) or the MICROLAB 2200 liquid transfer system (Hamilton). cDNA sequencing 
reactions were prepared using reagents provided by Amersham Pharmacia Biotech or supplied m ABI 
sequencing kits such as the ABI PRISM BIGDYE Terminator cycle sequencing ready reaction kit (PE 
Biosystems). Electrophoretic separation of cDNA sequencing reactions and detection of labeled 
polynucleotides were carried out using the MEGABACE 1000 DNA sequencing system (Molecular 
10 Dynamics); the ABI PRISM 373 or 377 sequencing system (PE Biosysiems) in conjunction with 
standard ABI protocols and base calling software: or other sequence analysis systems known in the 
art. Reading frames within the cDNA sequences were identified using standard methods (reviewed in 
Ausubel, 1997, supra . Chapter 7.7), Some of the cDNA sequences were selected for e.\tension using 
the techniques disclosed in Example Vin. 

15 

IV. Assembly and Analysis of Sequences 

Component sequences from chromatograms were subject to PHRED analysis and assigned a 
quality score. The sequences having at least a required quality score were subject to vanous pre- 
processing editing pathways to eliminate, e.g., low quality 3' ends, vector and linker sequences, poly A 
20 tails, AIu repeats, mitochondrial and ribosomal sequences, bacterial contamination sequences, and 
sequences smaller than 50 base pairs. In panicular. low-information sequences and repetitive 
elements (e.g., dinucleotide repeats. Alu repeats, etc.) were replaced by "n"s", or masked, to prevent 
spurious matches. 

Processed sequences were then subject to assembly procedures m which the sequences were 
25 assigned to gene bins (bins). Each sequence could only belong to one bin. Sequences in each gene 
bin were assembled to produce consensus sequences (templates). Subsequent new sequences were 
added to existing bins using BLASTn (v. 1.4 WashU) and CROSSMATCH. Candidate pairs were 
identified as all BLAST hits having a quality score greater than or equal to 150. Alignments of at 
least 82% local identity were accepted into the bin. The component sequences from each bin were 
3 0 assembled using a version of PHRAP. Bins with several overlapping component sequences were 

assembled using DEEP PHRAP. The orientation (sense or antisense) of each assembled template was 
determined based on the number and orientation of its component sequences. Template sequences as 
disclosed in the sequence listing correspond to sense strand sequences (the "forward" reading 
frames), to the best determination. The complementary (antisense) strands are inherently disclosed 
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herein. The component sequences which were used to assemble each template consensus sequence 
are listed in Table 4. along with their positions along the template nucleotide sequences. 

Bins were compared against each other and those having local similanty of at least 829^ were 
combined and reassembled. Reassembled bins having templates of msufficient overlap (less than 
5 95% local identity) were re-split. Assembled templates were also subject to analysis by 

S l li CHER/EXON MAPPER algorithms which analyze the probabilities of the presence of splice 
vanants, alternatively spliced exons, splice junctions, differential expression of alternative spliced 
genes across tissue types or disease states, etc. These resulting bins were subject to several rounds of 
the above assembly procedures. 

10 Once gene bins were generated based upon sequence alignments, bins were clone joined 

based upon clone information. If the 5' sequence of one clone was present in one bin and the 3' 
sequence from the same clone was present in a different bin. it was likely that the two bins actually 
belonged together in a single bin. The resulting combined bins unden^-ent assembly procedures to 
regenerate the consensus sequences. 

15 The final assembled templates were subsequently annotated using the following procedure. 

Template sequences were analyzed using BLASTn (v2.0. NCBI) versus gbpri (GenBank version 1 16). 
"Hits" were defined as an exact match having from 95% local identity over 200 base pairs through 
100% local identity over 100 base pairs, or a homolog match having an E- value, i.e. a probability 
score, of < 1 X 10 *. The hits were subject to frameshift FASTx versus GENPEPT (GenBank version 

2 0 116). (See Table 5). In this analysis, a homolog match was defined as having an E-value of < 1 x 10" 
The assembly method used above was described in "System and Methods for Analyzing 
Biomoiecular Sequences." U.S.S.N. 09/276.534. filed March 25, 1999, and the LIFESEQ Gold user 
manual (Incyte) both incorporated by reference herein. 

Following assembly, template sequences were subjected to motif, BLAST, and functional 

25 analyses, and categorized in protein hierarchies using methods described m. e.g., "Database System 
Employing Protein Function Hierarchies for Viewing Biomoiecular Sequence Data," U.S.S.N. 
08/812.290, filed March 6. 1997; "Relational Database for Storing Biomolecule Information." 
U.S.S.N. 08/947,845, filed October 9, 1997; "Project-Based Full-Length Biomoiecular Sequence 
Database," U.S.S.N. 08/81 1,758, filed March 6, 1997; and "Relational Database and System for 

30 Storing Information Relating to Biomoiecular Sequences." U.S.S.N. 09/034,807, filed March 4, 1998. 
all of which are incorporated by reference herein. 

The template sequences were further analyzed by translating each template in all three 
forward reading frames and searching each translation against the Pfam database of hidden Markov 
model-based protein families and domains using the HMMER software package (available to the 

35 public from Washington University School of Medicine, St. Louis MO). Regions of templates which, 
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when translated, contain sinularity to Pfam consensus sequences are reported in Table 2. along with 
descnpiions of Pfam protein domains and families. Only those Pfam hits with an E-vaiue of < 1 x 10 ' 
are reponed. (See also World Wide Web site http://pfam.wustl.edu/ for detailed descriptions of Pfam 
protein domains and families.) 
5 Additionally, the template sequences were translated in all three forward reading frames, and 

each translation was searched against hidden Markov models for signal peptide and transmembrane 
domains using the HMMER software package. Construction of hidden Markov models and their 
usage in sequence analysis has been descnbed. (See. for example. Eddy, S.R. (1996) Curr. Opin. Str. 
Biol. 6:361-365.) Regions of templates which, when translated, contain similarity to signal peptide or 

10 transmembrane domain consensus sequences are reponed in Table 3. Only those signal peptide or 
transmembrane hits with a cutoff score of 1 1 bits or greater are reported. A cutoff score of 1 1 bits or 
greater corresponds to at least about 91-94% true-positives in signal peptide prediction, and at least 
about 75% true-positives m transmembrane domain prediction. 

The results of HMMER analysis as reported in Tables 2 and 3 may suppon the results of 

15 BLAST analysis as reponed in Table 1 or may suggest alternative or additional properties of 
template-encoded polypeptides not previously uncovered by BLAST or other analyses. 

Template sequences are funher analyzed using the bioinformatics tools listed in Table 5, or 
usmg sequence analysis software known in the an such as MACDN.^SIS PRO software (Hitachi 
Software Engmeenng, South San Francisco CA) and LASERGENE software (DNASTAR). 

20 Template sequences may be funher queried against public databases such as the GenBank rodent, 
mammalian, venebrate, prokaryote, and eukaryote databases. 

V. Analysis of Polynucleotide Expression 

Nonhem analysis is a laboratory technique used to detect the presence of a transcript of a 
25 gene and involves the hybridization of a labeled nucleotide sequence to a membrane on which RNAs 
from a panicular cell type or tissue have been bound. (See, e.g.. Sambrook, supra , ch. 7; Ausubel, 
1995, supra , ch. 4 and 16.) 

Analogous computer techniques applying BLAST were used to search for identical or related 
molecules in cDNA databases such as GenBank or LIFESEQ (Incyte Pharmaceuticals). This analysis 
30 is much faster than multiple membrane-based hybridizations. In addition, the sensitivity of the 

computer search can be modified to determine whether any panicular match is categorized as exact or 
similar. The basis of the search is the product score, which is defined as: 

BLAST Score x Percent Identitv 

35 5 X minimum {length(Seq. 1), length(Seq. 2) } 

42 



wo 00/75298 



PCT/USOO/15344 



The product score takes into account both the degree of similarity between two sequences and the 
length of the sequence match. The product score is a normalized value between 0 and 100. and is 
calculated as follows: the BLAST score is multiplied by the percent nucleotide identity and the 
product is divided by (5 times the length of the shoner of the two sequences). The BLAST score is 
5 calculated by assigning a score of +5 for every base that matches in a high-scoring segment pair 
(HSP), and -4 for every mismatch. Two sequences may share more than one HSP (separated by 
gaps). If there is more than one HSP. then the pair with the highest BLAST score is used to calculate 
the product score. The product score represents a balance between fractional overlap and quality in a 
BLAST alignment. For example, a product score of 100 is produced only for 1007c identity over the 
10 entire length of the shorter of the two sequences being compared. A product score of 70 is produced 
either by 100% identity and 70% overlap at one end. or by 88% identity and 100% overlap at the 
other. A product score of 50 is produced either by 100% identity and 50% overlap at one end. or 79% 
identity and 100% overlap. 

15 VL Tissue Distribution Profiling 

A tissue distribution profile is determined for each template by compiling the cDNA library 
tissue classifications of its component cDNA sequences. Each component sequence, is derived from 
a cDNA library constructed from a human tissue. Each human tissue is classified into one of the 
following categories: cardiovascular system; connective tissue: digestive system; embryonic 
0 structures: endocrine system: exocrine glands; genitalia, female; genitalia, male: germ cells; hemic 
and immune system; liver; musculoskeletal system: nervous system: pancreas: respiratory system; 
sense organs: skin; stomatognathic system; unclassified/mixed; or urinary tract. Template sequences, 
component sequences, and cDNA library/tissue information are found in the LIFESEQ GOLD 
database (Incyte Genomics, Palo Alto CA). 

25 

VIL Transcript Image Analysis 

Transcript images are generated as described in Seilhamer et a)., "Comparative Gene 
Transcript Analysis," U.S. Patent Number 5.840.484, incorporated herein by reference. 

30 VIII. Extension of Polynucleotide Sequences and Isolation of a Full-length cDNA 

Oligonucleotide pnmers designed using an mddt of the Sequence Listing are used to extend 
the nucleic acid sequence. One primer is synthesized to initiate 5' extension of the template, and the 
other primer, to initiate 3' extension of the template. The initial primers may be designed using 
OLIGO 4.06 software (National Biosciences. Inc. (National Biosciences), Plymouth MN), or another 
35 appropriate program, to be about 22 to 30 nucleotides in length, to have a GC content of about 50% or 
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r re, and to anneal to the target sequence at temperatures of about 68 °C to about 72 °C. Any stretch 
C- nucleotides which would result in hairpin structures and pnmer-pnmer dimenzations are avoided. 
Selected human cDNA libraries are used to extend the sequence. If more than one extension is 
necessary or desired, additional or nested sets of primers are designed. 
5 High fidelity amplification is obtained by PGR using methods well known in the an. PCR is 

performed in 96-well plates using the PTC- 200 thermal cycler (MJ Research). The reaction mix 
contains DNA template. 200 nmol of each primer, reaction buffer containing Mg^*, (NHj)2SOi. and 6- 
mercaptoeihanol, Taq DNA polymerase (Amersham Pharmacia Biotech). ELONGASE enzyme (Life 
Technologies), and Pfu DNA polymerase (Stratagene), with the following parameters for primer pair 

10 PCI A and PCI B: Step 1: 94''C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C. 1 min; Step 4: 68°C. 2 
min; Step 5: Steps 2, 3, and 4 repeated 20 times; Step 6: 68 °C, 5 min; Step 7: storage at 4°C. In the 
alternative, the parameters for primer pairT7 and SK+ are as follows: Step 1 : 94°C. 3 min; Step 2: 
94 X, 15 sec; Step 3: 57''C, 1 min; Step 4: 68°C. 2 min; Step 5: Steps 2. 3. and 4 repeated 20 times; 
Step 6; 68°C, 5 min; Step 7: storage at4X. 

15 The concentration of DNA in each well is determined by dispensing 100 ^\ PICOGREEN 

quantitation reagent (0.257c (v/v); Molecular Probes) dissolved m IX Tns-EDTA (TE) and 0.5 ^i of 
undiluted PCR product into each well of an opaque fluorimeter plate (Coming Incorporated 
(Coming), Coming NY), allowing the DNA to bind to the reagent. The plate is scanned in a 
FLUOROSKAN II (Labsystems Oy) to measure the fluorescence of the sample and to quantify the 

20 concentration of DNA. A 5 jil to 10 iil aliquot of the reaction mixture is analyzed by electrophoresis 
on a 1 % agarose mini-gel to determine which reactions are successful in extending the sequence. 

The extended nucleotides are desalted and concentrated, transferred to 384-weH plates, 
digested with CviJI cholera virus endonuclease (Molecular Biology Research, Madison WI), and 
sonicated or sheared prior to religation into pUC 18 vector (Amersham Pharmacia Biotech). For 

25 shotgun sequencing, the digested nucleotides are separated on low concentration (0.6 to 0.8%) 

agarose gels, fragments are excised, and agar digested with AGAR ACE (Promega). Extended clones 
are religated using T4 ligase (New England Biolabs, Inc.. Beveriy MA) into pUC 18 vector 
(Amersham Pharmacia Biotech), treated with Pfu DNA polymerase (Stratagene) to fill-in restriction 
site overhangs, and transfected into competent E. coli cells. Transformed cells are selected on 

30 antibiotic-containing media, individual colonies are picked and cultured overnight at 37°C in 384- 
well plates in LB/2x carbenicillin liquid media. 

The cells are lysed, and DNA is amplified by PCR using Taq DNA polymerase (Amersham 
Pharmacia Biotech) and Pfu DNA polymerase (Stratagene) with the following parameters: Step 1 : 
94°C, 3 min; Step 2: 94°C, 15 sec; Step 3: 60°C, 1 min; Step 4: 72°C, 2 min; Step 5: steps 2, 3, and 4 

35 repeated 29 times; Step 6: 72 °C, 5 min; Step 7: storage at 4°C. DNA is quantified by PICOGREEN 
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reagent (Molecular Probes) as descnbed above. Samples with low DNA recoveries are reamplified 
using the same conditions as described above. Samples are diluted with 20% dimethysulfoxide ( 1 .2. 
v/v). and sequenced using DYENAMIC energy transfer sequencmg pnmers and the DYENAMIC 
DIRECT kit (Amersham Pharmacia Biotech) or the ABI PRISM BIGDYE Terminator cycle 
sequencing ready reaction kit (PE Biosystems). 

In like manner, the mddt is used to obtain regulatory sequences (promoters, introns. and 
enhancers) using the procedure above, oligonucleotides designed for such extension, and an 
appropriate genomic library. 

IX . Labeling of Probes and Southern Hybridization Analyses 

Hybridization probes derived from the mddt of the Sequence Listing are employed for 
screening cDNAs, mRNAs, or genomic DNA. The labeling of probe nucleotides between 100 and 
1000 nucleotides in length is specifically described, but essentially the same procedure may be used 
with larger cDNA fragments. Probe sequences are labeled at room temperature for 30 minutes using 
a T4 polynucleotide kinase, y'-P-ATP, and 0.5X One-Phor-All Pius (Amersham Pharmacia Biotech) 
buffer and purified using a ProbeQuant G-50 Microcolumn (Amersham Pharmacia Biotech). The 
probe mixture is diluted to 10" dpmy^g/ml hybridization buffer and used in a typical membrane-based 
hybndization analysis. 

The DNA is digested with a restriction endonuclease such as Eco RV and is eiectrophoresed 
through a 0.7% agarose gel. The DNA fragments are transferred from the agarose to nylon membrane 
(NYTRAN Pius, Schleicher & Schuell, Inc.. Keene NHI using procedures specified by the 
manufacturer of the membrane. Prehybndization is carried out for three or more hours at 68 °C. and 
hybridization is carried out overnight at 68 °C. To remove non-specific signals, blots are sequentially 
washed at room temperature under increasingly stnngent conditions, up to 0.1 x saline sodium citrate 
(SSC) and 0.5% sodium dodecyl sulfate. After the blots are placed in a PHOSPHORLMAGER 
cassette (Molecular Dynamics) or are exposed to autoradiography film, hybridization patterns of 
standard and experimental lanes are compared. Essentially the same procedure is employed when 
screening RNA. 

X. Chromosome Mapping of mddt 

The cDNA sequences which were used to assemble SEQ ID NO: 1-14 are compared with 
sequences from the Incyte LIFESEQ database and public domain databases using BLAST and other 
implementations of the Smith-Waterman algorithm. Sequences from these databases that match SEQ 
ID NO; 1-14 are assembled into clusters of contiguous and overlapping sequences using assembly 
algorithms such as PHRAP (Table 5). Radiation hybrid and genetic mapping data available from 
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public resources such as the Stanford Human Genome Center (SHGC), Whitehead Institute for 
Genome Research (WIGR), and Genethon are used to determine if any of the clustered sequences 
have been previously mapped. Inclusion of a mapped sequence in a cluster will result m the 
assignment of all sequences of that cluster, including its particular SEQ ID NO:, to that map location. 
The genetic map locations of SEQ ID NO: 1-14 are described as ranges, or intervals, of human 
chromosomes. The map position of an interval, in centiMorgans, is measured relative to the terminus 
of the chromosome's p-arm. (The centiMorgan (cM) is a unit of measurement based on recombmaiion 
frequencies betv^een chromosomal markers. On average. 1 cM is roughly equivalent to 1 megabase 
(Mb) of DNA in humans, although this can vary widely due to hot and cold spots of recombination.) 
The cM distances are based on genetic markers mapped by Genethon which provide boundaries for 
radiation hybrid markers whose sequences were included in each of the clusters. 

XI. Microarray Analysis 

^robe Preparation from Tissue or Cell Samples 

Total RNA is isolated from tissue samples using the guanidinium thiocyanate method and 
polyA* RNA is purified using the oligo (dT) cellulose method. Each poiyA* RNA sample is reverse 
transcribed using MMLV reverse-transcriptase, 0.05 pg/pi oligo-dT primer (21mer), IX first strand 
buffer. 0.03 units/^1 RNase inhibitor. 500 ^iM dATP, 500 dGTP. 500 pM dlTP. 40 pM dCTP. 40 
|iM dCTP-Cy3 (BDS) or dCTP-Cy5 (Amersham Pharmacia Biotech). The reverse transcription 
reaction is performed in a 25 ml volume containing 200 ng polyA* RNA with GEMBRIGHT kits 
(Incyte). Specific control polyA* RNAs are synthesized by in vitro transcription from non-coding 
yeast genomic DNA (W. Lei. unpublished). As quantitative controls, the control mRNAs at 0.002 ng, 
0.02 ng. 0.2 ng, and 2 ng are diluted into reverse transcription reaction at ratios of i : 100.000. 
1:10.000, 1:1000, 1 : 100 (w/w) to sample mRNA respectively. The control mRNAs are diluted into 
reverse transcription reaction at ratios of 1:3, 3:1. 1:10. 10:1. 1:25, 25: 1 (w/w) to sample mRNA 
differential expression patterns. After incubation at 37" C for 2 hr, each reaction sample (one with 
Cy3 and another with CyS labeling) is treated with 2.5 ml of 0.5M sodium hydroxide and incubated 
for 20 minutes at 85°C to the stop the reaction and degrade the RNA. Probes are purified using two 
successive CHROMA SPIN 30 gel filtration spin columns (CLONTECH Laboratones, Inc. 
(CLONTECH), Palo Alto CA) and after combining, both reaction samples are ethanol precipitated 
using 1 ml of glycogen (1 mg/ml), 60 ml sodium acetate, and 300 ml of 100% ethanol. The probe is 
then dried to completion using a SpeedVAC (Savant Instruments Inc., Holbrook NY) and 
resuspended in 14 jil 5X SSC/0.2% SDS. 
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Microarrav Preparation 

Sequences of the present invention are used to generate array elements. Each array element 
is amplified from bacterial cells containing vectors with cloned cDNA msens. PCR amplification 
uses primers complementary to the vector sequences flanking the cDNA msen. Array elements are 
5 amplified in thirty cycles of PCR from an initial quantity of 1-2 ng to a final quantity greater than 5 
fjg. Amplified array elements are then purified using SEPHACRYL-400 (Amersham Pharmacia 
Biotech). 

Purified array elements are immobilized on polymer-coated glass slides. Glass microscope 
slides (Coming) are cleaned by ultrasound in 0. 1 % SDS and acetone, with extensive distilled water 
10 washes between and after treatments. Glass slides are etched in 4% hydrofluoric acid (VWR 

Scientific Products Corporation (VWR), West Chester, PA), washed extensively in distilled water, 
and coated with 0.05% aminopropyl silane (Sigma) in 959c ethanol. Coated slides are cured in a 
1 10°C oven. 

Array elements are applied to the coated glass substrate using a procedure described in US 
15 Patent No. 5,807,522, incorporated herein by reference. 1 |il of the array element DNA. at an average 
concentration of 100 ng/^l, is loaded into the open capillary printing element by a high-speed robotic 
apparatus. The apparatus then deposits about 5 nl of array element sample per slide. 

Microarrays are UV-crosslinked using a STRATALENKER UV-crossl inker (Stratagene). 
Microarrays are washed at room temperature once in 0.2% SDS and three times in distilled water. 
2 0 Non-specific binding sites are blocked by incubation of microarrays in 0.2% casein in phosphate 
buffered salme (PBS) (Tropix, Inc.. Bedford, MA) for 30 minutes at 60° C followed by washes in 
0.2% SDS and distilled water as before. 

Hybridization 

25 Hybridization reactions contain 9 |il of probe mixture consisting of 0.2 jig each of Cy3 and 

Cy5 labeled cDNA synthesis products in 5X SSC. 0.2% SDS hybridization buffer. The probe mixture 
is heated to 65° C for 5 minutes and is aliquoted onto the microarray surface and covered with an 1.8 
cm- coverslip. The arrays are transferred to a waterproof chamber having a cavity just slightly larger 
than a microscope slide. The chamber is kept at 100% humidity internally by the addition of 140 pi 

30 of 5x SSC in a comer of the chamber. The chamber containing the arrays is incubated for about 6.5 
hours at 60°C. The arrays are washed for 10 min at 45"'C in a first wash buffer (IX SSC, 0.1% SDS), 
three times for 10 minutes each at45''C in a second wash buffer (O.IX SSC). and dried. 

Detection 

35 Reponer-labeled hybridization complexes are detected with a microscope equipped with an 
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Innova 70 mixed gas 10 W laser (Coherent. Inc., Sama Clara CA) capable of generating spectral lines 
at 488 nm for excitation of Cy3 and at 632 nm for excitation of Cy5. The excitation laser light is 
focused on the array using a 20X microscope objective (Nikon. Inc.. Melville NY). The slide 
containing the array is placed on a computer-controlled X-Y stage on the microscope and raster- 
5 scanned past the objective. The 1.8 cm x 1.8 cm array used in the present example is scanned with a 
resolution of 20 micrometers. 

In two separate scans, a mixed gas multiline laser excites the two fluorophores sequentially. 
Emitted light is split, based on wavelength, into two photcmultiplier tube detectors (PMT R1477. 
Hamamatsu Photonics Systems. Bridgewater NJ) corresponding to the two fluorophores. Appropnate 
10 filters positioned between the array and the photomultiplier tubes are used to filter the signals. The 
emission maxima of the fluorophores used are 565 nm for Cy3 and 650 nm for Cy5. Each array is 
typically scanned twice, one scan per fiuorophore using the appropriate filters at the laser source, 
ahhough the apparatus is capable of recording the spectra from both fluorophores simultaneously. 
Thu sensitivity of the scans is typically calibrated using the signal intensity generated by a 
15 cDNA control species added to the probe mix at a known concentration. A specific location on the 
array contains a complementary DNA sequence, allowing the intensity of the signal at that location to 
be correlated with a weight ratio of hybridizing species of 1 : 1 00.000. When two probes from 
different sources (e.g., representing test and control cells), each labeled with a different fiuorophore, 
are hybridized to a single array for the purpose of identifying genes that are differentially expressed, 
20 the calibration is done by labeling samples of the calibrating cDNA with the two fluorophores and 
adding identical amounts of each to the hybridization mixture. 

The output of the photomultiplier tube is digitized using a 12-bit RTI-835H analog-to-digital 
(A/D) conversion board (Analog Devices, Inc.. Norwood. MA) installed in an IBM-compatible PC 
computer. The digitized data are displayed as an image where the signal intensity is mapped using a 
25 linear 20-color transformation to a pseudocolor scale ranging from blue (low signal) to red (high 
signal). The data is also analyzed quantitatively. Where two different fluorophores are excited and 
measured simultaneously, the data are first corrected for optical crosstalk (due to overlapping 
emission spectra) between the fluorophores using each fiuorophore "s emission spectrum. 

A grid is superimposed over the fluorescence signal image such that the signal from each spot 
30 is centered in each element of the grid. The fluorescence signal wiihin each element is then 
integrated to obtain a numerical value corresponding to the average intensity of the signal. The 
software used for signal analysis is the GEMTOOLS gene expression analysis program (Incyte). 

XII. Complementary Nucleic Acids 
35 Sequences complementary to the mddt are used to detect, decrease, or inhibit expression of 
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the namrally occumng nucleotide. The use of oligonucleotides comprising from about 15 to 30 base 
pairs IS typical in the art. However, smaller or larger sequence fragments can also be used. 
Appropriate oligonucleotides are designed from the mddt using OLIGO 4.06 software (Naiiona! 
Biosciences) or other appropriate programs and are synthesized using methods standard in the an or 
5 ordered from a commercial supplier. To inhibit transcription, a complementary oligonucleotide is 
designed from the most unique 5' sequence and used to prevent transcription factor binding to the 
promoter sequence. To inhibit translation, a complementary oligonucleotide is designed to prevent 
ribosomal binding and processing of the transcript. 

10 XIII. Expression of MDDT 

Expression and purification of MDDT is accomplished using bacterial or virus-based 
expression systems. For expression of MDDT in bacteria. cDNA is subcloned into an appropriate 
vector containing an antibiotic resistance gene and an inducible promoter that directs high levels of 
cDNA transcription. Examples of such promoters include, but are not limited to, the trp-lac iiac) 
15 hybnd promoter and the T5 or T7 bacteriophage promoter in conjunction with the iac operator 
regulatory element. Recombinant vectors are transformed into suitable bacterial hosts, e.g.. 
BL21(DE3). Antibiotic resistant bacteria express MDDT upon induction with isopropyl beia-D- 
thiogalactopyranoside (IPTG). Expression of MDDT in eukaryotic ceils is achieved by infecting 
insect or mammalian cell lines with recombinant Autographica califomica nuclear polyhedrosis virus 
20 (AcMNPV), commonly known as baculovirus. The nonessential polyhedrin gene of baculovirus is 
replaced with cDNA encoding MDDT by either homologous recombination or bacterial-mediated 
transposition involving transfer plasmid intermediates. Viral infectivity is maintained and the strong 
polyhedrin promoter drives high levels of cDNA transcription. Recombinant baculovirus is used to 
infect Spodoptera frueiperda (Sf9) insect cells in most cases, or human hepatocytes, in some cases. 
25 Infection of the latter requires additional genetic modifications to baculovirus. (See e.g., Engelhard, 
supra : and Sandig, supra .) 

In most expression systems, MDDT is synthesized as a fusion protein with, e.g., glutathione 
S-transferase (GST) or a peptide epitope tag. such as FLAG or 6-His, permitting rapid, single-step, 
affmity-based purification of recombinant fusion protein from cmde cell lysates. GST, a 26- 
30 kilodalton enzyme from Schistosoma laponicum , enables the purification of fusion proteins on 

immobilized glutathione under conditions that maintain protein activity and antigenicity (Amersham 
Pharmacia Biotech). Following purification, the GST moiety can be proteolytically cleaved from 
MDDT at specifically engineered sites. FLAG, an 8-amino acid peptide, enables immunoaffinity 
purification using commercially available monoclonal and polyclonal anti-FLAG antibodies (Eastman 
3 5 Kodak Company, Rochester NY). 6-His, a stretch of six consecutive histidine residues, enables 
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purification on metal-chelate resms (QlAGEN). Methods for protein expression and punncation are 
discussed in Ausubel (1995. supra . Chapters iO and 16). Purified MDDT obtained by these methods 
can be used directly in the following activity assay. 

XIV. Demonstration of MDDT Activity 

MDDT. or biologically active fragments thereof, are labeled with Bohon-Hunter reagent. 
(See, e.g.. Bolton. A.E. and W.M. Hunter (1973) Biochem. J. 133:529-539.) Candidate molecules 
previously arrayed in the wells of a multi-well plate are incubated with the labeled MDDT, washed, 
and any wells with labeled MDDT complex are assayed. Data obtained using different 
concentrations of MDDT are used to calculate values for the number, affinity, and association of 
MDDT with the candidate molecules. 

Alternatively, molecules interacting with MDDT are analyzed using the yeast two-hybrid 
system as described in Fields, S. and O. Song (1989) Nature 340:245-246, or using commercially 
available kits based on the two-hybrid system, such as the MATCHMAKER system (CLONTECH). 

MDDT may also be used in the PATHCALLING process (CuraGen Corp., New Haven CT) 
which employs the yeast two-hybrid system in a high-throughput manner to determine all interactions 
between the proteins encoded by two large libraries of genes (Nandabalan. K. et al. (2000) U.S. 
Patent No. 6.057.101). 

XV. Functional Assays 

MDDT function is assessed by expressing mddt at physiologically elevated levels in 
mammalian cell culture systems. cDNA is subcloned into a mammalian expression vector containmg 
a strong promoter that drives high levels of cDNA expression. Vectors of choice include pCMV 
SPORT (Life Technologies) and pCR3.1 (Invitrogen Corporation. Carlsbad CA). both of which 
contain the cytomegalovirus promoter. 5-10 ng of recombinant vector are transiently transfected into 
a human cell line, preferably of endothelial or hematopoie - origin, using either liposome 
formulations or eiectroporation. 1-2 ^g of an additional plasmid containing sequences encoding a 
marker protein are co-transfected. 

Expression of a marker protein provides a means to distinguish transfected cells from 
nontransfected cells and is a reliable predictor of cDNA expression from the recombinant vector. 
Marker proteins of choice include, e.g., Green Fluorescent Protein (GFP; CLONTECH), CD64, or a 
CD64-GFP fusion protein. Flow cytometry (PCM), an automated laser optics-based technique, is 
used to identify transfected cells expressing GFP or CD64-GFP and to evaluate the apoptotic state of 
the cells and other cellular properties. 
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FCM detects and quantifies the uptake of fluorescent molecules that diagnose events 
preceding or coincident with cell death. These events include changes in nuclear DNA content as 
measured by staining of DNA with propidium iodide; changes in cell size and granulaniy as measured 
by forward light scatter and 90 degree side light scatter: down-regulation of DNA synthesis as 
5 measured by decrease in bromodeoxyuridine uptake; alterations in expression of cell surface and 
intracellular proteins as measured by reactivity with specific antibodies; and alterations in plasma 
membrane composition as measured by the binding of fluorescein-conjugaied Annexin V protein to 
the cell surface. Methods in flow cytometp*' are discussed in Ormerod. M. G. (1994) Flow 
Cvtometrv . Oxford, New York NY. 

10 The influence of MDDT on gene expression can be assessed using highly purified 

populations of cells transfected with sequences encoding MDDT and either CD64 or CD64-GFP. 
CD64 and CD64-GFP are expressed on the surface of transfected cells and bind to conserved regions 
of human immunoglobulin G (IsG). Transfected cells are efficiently separated from noniransfected 
cells using magnetic beads coated with either human IgG or antibody against CD64 (DYNAL. Inc., 

15 Lake Success NY). mRNA can be purified from the cells using methods well known by those of skill 
in the art. Expression of mRNA encoding MDDT and other genes of interest can be analyzed by 
northern analysis or microarray techniques. 

XVI. Production of Antibodies 

2 0 MDDT substantially purified using polyacrylamide gel electrophoresis (PAGE; see, e.g.. 

Harrington, M.G. (1990) Methods Enzymol. 182:488-495), or other purification techniques, is used to 
immunize rabbits and to produce antibodies usmg standard protocols. 

Alternatively, the MDDT amino acid sequence is analyzed using LASERGENE software 
(DNASTAR) to determine regions of high immunogenicity, and a corresponding peptide is 

2 5 synthesized and used to raise antibodies by means known to those of skill in the art. Methods for 

selection of appropriate epitopes, such as those near the C-terminus or in hydrophiiic regions are well 
described in the art. (See. e.g., Ausubel, 1995, supra . Chapter 1 1 .) 

Typically, peptides 15 residues in length are synthesized using an ABI431A peptide 
synthesizer (PE Biosystems) using fmoc-chemistry and coupled to KLH (Sigma) by reaction with N- 

3 0 maleimidobenzoyl-N-hydroxysuccinimide ester (MBS) to increase immunogenicity. (See, e.g., 

Ausubel, supra .) Rabbits are immunized with the peptide-KLH complex in complete Freund's 
adjuvant. Resultmg antisera are tested for antipeptide activity by, for example, binding the peptide to 
plastic, blocking with 1 % BSA. reacting with rabbit antisera, washing, and reacting with radio- 
iodinated goat anti-rabbit IgG. Antisera with antipeptide activity are tested for anti-MDDT activity 
3 5 using protocols well known in the art, including ELISA, RIA. and immunobiolting. 
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XVII. Purification of Naturally Occurring MDDT Using Specific Antibodies 

Naturally occurring or recombinant MDDT is substantially purified by immunoaffinity 
ciiromatography using antibodies specific for MDDT. An immunoaffinity column is constructed by 
covalently coupling anti-MDDT antibody to an activated chromatographic resin, such as 
5 CNBr-activated SEPHAROSE (Amersham Pharmacia Biotech). After the coupling, the resin is 
blocked and washed according to the manufacturer s instructions. 

Media containing MDDT are passed over the immunoaffinity column, and the column is 
washed under conditions that allow the preferential absorbance of MDDT (e.g.. high ionic strength 
buffers in the presence of detergent). The column is eluted under conditions that disrupt 
1 0 antibody/MDDT binding (e.g., a buffer of pH 2 to pH 3. or a high concentration of a chaoirope. such 
as urea or thiocyanate ion), and MDDT is collected. 

All publications and patents mentioned in the above specification are herein incorporated by 
reference. Various modifications and variations of the described method and system of the invention 

15 will be apparent to those skilled in the an without departing from the scope and spirit of the 
invention. Although the invention has been described in connection with specific preferred 
embodiments, it should be understood that the invention as claimed should not be unduly limited to 
such specific embodiments. Indeed, various modifications of the above-descnbed modes for carrying 
out the invention which are obvious to those skilled in the field of molecular biology or related fields 

20 are intended to be within the scope of the following claims. 
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TABLE 3 



SEQIDNO. 


Temolate ID 


Start 


Stop 


Frame 


Domain Type 


1 


222197.6 


317 


406 


forward 2 


SP 


1 


222197.6 


901 


984 


forward 1 


TM 


2 


2277D9.3 


563 


649 


forward 2 


SP 


5 


243096.6 


3096 


3182 


forward 3 


S° 


6 


244366.6 


2801 


2878 


forward 2 


TM 


7 


405313.4 


2256 


2333 


forward 3 


TM 


7 


405313.4 


1503 


1589 


forward 3 


TM 
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TABLE 4 

SEQ ID NO- Template ID Component I 

1 222197.6 3989355H1 

1 222197.6 3989355R6 

1 222197.6 91189739 

1 222197.6 91123521 

1 222197.6 3417884H2 

1 222197.6 33989 16H1 

1 222197.6 696738H1 

1 222197.6 3387328H1 

1 222197.6 3387328F6 

1 222197.6 640954H1 

1 222197.6 640954R1 

1 222197.6 2674395H1 

1 222197.6 4871937H1 

1 222197.6 6014949H1 

1 222197.6 1310167H1 

1 222197.6 1310167F6 

1 222197.6 3422058H1 

1 222197.6 1429773H1 

1 222197.6 1429773F6 

1 222197.6 4725459H1 

1 222197.6 2692245H1 

1 222197.6 2692245F6 

1 222197.6 2658283H1 

1 222197.6 4402233H1 

1 222197.6 673783H1 

1 222197.6 487422H1 

1 222197.6 3928678H1 

1 222197.6 2641613F6 

1 222197.6 2641613H1 

1 222197.6 2770396H1 

1 222197.6 2599469H1 

1 222197.6 48611 5H1 

1 222197.6 1 62661 5H1 

1 222197.6 1626615F6 

1 222197.6 383522H] 

1 222197.6 3355867H1 

1 222197.6 3617236H1 

1 222197.6 3510978H] 

1 222197.6 1568105H1 

1 222197.6 1571377H1 

1 222197.6 3806389H1 

1 222197.6 g774888 

1 222197.6 2995341H1 

1 222197.6 5547807H1 

1 222197.6 619375H1 

1 222197.6 g 1962367 

1 222197.6 2695323H1 

1 222197.6 3142880H1 

1 222197.6 38Ce357Hl 

1 222197.6 1962884H1 

56 



Start 


Stop 


1 


122 


1 


462 


58 


533 


56 


494 


105 


341 


111 


329 


228 


480 


248 


542 


248 


705 


499 


771 


499 


841 


544 


647 


609 


809 


680 


954 


729 


952 


729 


1153 


733 


986 


748 


1016 


748 


1211 


770 


889 


774 


1025 


774 


1300 


818 


1051 


847 


1083 


847 


1089 


871 


1 123 


898 


1175 


1019 


1494 


1019 


1259 


1027 


1273 


1040 


131 1 


1181 


1456 


1247 


1456 


1247 


1728 


1260 


1526 


1261 


1532 


1286 


1573 


1300 


1567 


1325- 


1446 


1325 


1550 


1368 


1628 


1369 


1729 


1382 


1634 


1412 


1611 


1501 


1738 


1501 


1997 


1517 


1790 


1518 


1792 


1518 


1820 


1538 


1808 



wo 00/75298 



PCT/USOO/15344 



Temolate ID 


Component ID 


Starr 


222197.6 


1807088F6 


1555 


222197.6 


1400677H1 


1628 


222197.6 


811652H1 


1650 


222197.6 


3048110H1 


1665 


222197.6 


3048 nOF6 


1665 


222197.6 


3048102H1 


1665 


222197.6 


5059358H1 


1668 


222197.6 


2150138H1 


1671 


222197.6 


039720H1 


1711 


222197.6 


2434795H1 


3030 


222197,6 


213O701HI 


3039 


222197.6 


2647674H1 


1740 


222197.6 


34489 15H1 


1810 


222197.6 


43772 15H1 


1841 


222197.6 


43T7587H1 


1841 


222197.6 


1851036H1 


1841 


222197.6 


51 08761 HI 


1861 


222197.6 


2893265H1 


1873 


222197.6 


1568060H1 


1878 


222197.6 


1568004H1 


1878 


222197.6 


3531152H1 


1889 


222197.6 


2851378H1 


1922 


222197.6 


1931202H1 


1922 


222197.6 


2132091R6 


1936 


222197.6 


21 32091 HI 


1936 


222197.6 


5288578H1 


1941 


222197.6 


g21 10737 


1944 


222197.6 


2207507H1 


1996 


222197.6 


2345452H1 


1996 


222197.6 


3414767H1 


2025 


222197.6 


13061 71 F6 


2042 


222197.6 


13061 71 HI 


2042 


222197.6 


4588038H1 


2066 


222197.6 


4587760H1 


2066 


222197.6 


1389765H1 


2108 


222197.6 


gl 137612 


2112 


222197.6 


266371 7H1 


2146 


222197.6 


3321090H1 


2152 


222197.6 


3840347H1 


2151 


222197.6 


2415559F6 


2175 


222197.6 


2415559H1 


2175 


222197.6 


3146224H1 


2178 


222197.6 


4201740H1 


2178 


222197.6 


3713261H1 


2194 


222197.6 


5900620H1 


2195 


222197.6 


1239238H1 


2205 


222197.6 


1965353R6 


2235 


222197.6 


1965353H1 


2235 


222197.6 


1471606H1 


2273 


222197.6 


3929839H1 


2278 



Stop 

2037 

1894 

1954 

1918 

1992 

1965 

1965 

1925 

1970 

3132 

3136 

1841 

2065 

2043 

1916 

2033 

2107 

2139 

2083 

2097 

2207 

2260 

2196 

2208 

2096 

2067 

2226 

2250 

2256 

2265 

2378 

2284 

2349 

2221 

2367 

2425 

2388 

2435 

2339 

2618 

2420 

2430 

2451 

2446 

2484 

2358 

2691 

2500 

2483 

2578 



57 



wo 00/75298 



PCT/US00/1S344 



Template ID 


Component ID 


Start 


222197.6 


1521966H1 


2278 


222197.6 


1449385H1 


2291 


222197.6 


2381896H1 


2307 


222197.6 


2381895H1 


2307 


222197.6 


4643037H1 


2326 


711\91.b 


3703243H1 


2351 


222197.6 


4295130H1 


2350 


222197.6 


4296184H1 


2350 


222197.6 


5841522H2 


2396 


222197.6 


3790395F6 


2413 


222197.6 


3811615H1 


2413 


222197.6 


2353349H1 


2414 


222197.6 


5698 17H1 


2413 


222197.6 


1621792H1 


2413 


222197.6 


520835H1 


2417 


222197.6 


92161759 


2433 


222197.6 


1463438H1 


2433 


222197.6 


1621792T6 


2437 


222197.6 


3475011 HI 


2442 


222197.6 


1969764H1 


2447 


222197.6 


4188958H1 


2451 


222197.6 


2134836H1 


2458 


222197.6 


4054390H1 


2467 


222197.6 


5185060H1 


2468 


222197.6 


4058390H1 


2468 


222197.6 


4024007H1 


2469 


222197.6 


5597388H1 


2493 


222197.6 


3934968H1 


2512 


222197.6 


2770396T6 


2518 


222197.6 


993964H1 


2526 


222197,6 


1807088T6 


2531 


222197.6 


21 32091 T6 


2530 


222197.6 


1965395T6 


2532 


222197.6 


1805709H1 


2532 


222197.6 


4466288H1 


2537 


222197.6 


3020435H1 


2537 


222197.6 


g2355832 


2538 


222197.6 


1672661 HI 


2554 


222197.6 


1881147H1 


2554 


222197.6 


50983 16H1 


2573 


222197.6 


1429773T6 


2573 


222197.6 


1626615T6 


2584 


222197.6 


1479854T6 




222197.6 


3935053H1 


2598 


222197.6 


393091 8H1 


2598 


222197.6 


1654064H1 


2608 


222197.6 


2951 301 HI 


2619 


222197.6 


g4223642 


2627 


222197.6 


2752320H1 


2628 


222197.6 


g2161260 


2634 




58 





Stop 

2474 

2533 

2560 

2559 

2554 

2650 

2618 

2589 

2675 

2974 

2745 

2513 

2660 

2625 

2637 

2797 

2622 

3096 

2682 

2686 

2774 

2578 

2749 

2694 

2580 

2784 

2771 

2787 

3095 

2698 

3099 

3101 

3098 

2781 

2803 

2821 

3035 

2667 

2807 

2856 

3089 

3091 

3117 

2897 

2915 

2850 

2908 

3028 

2928 

3031 



wo 00/75298 



PCT/USOO/15344 



Template ID 


Component ID 


Start 


Stop 


222197.6 


13061 71T6 


2636 


3099 




3387328T6 


2640 


3092 


222197.6 


811652T6 


2639 


3097 


222197.6 


1959S41H1 


2652 


2915 


222197.6 


1959841T6 


2652 


3094 


222197.6 


1959841R6 


2652 


3110 


222197.6 


94265714 


2655 


3143 


222197.6 


94186863 


2662 


3139 


222197.6 


93249761 


2663 


3146 


222197.6 


94114970 


2663 


3136 


222197.6 


705080H1 


2666 


2908 


222197.6 


2045743H1 


2673 


2971 


222197.6 


2415559T6 


2675 


3097 


222197.6 


94394362 


2678 


3067 


222197.6 


92617967 


2680 


3140 




91548565 


2685 


3028 


222197.6 


3790395T6 


2684 


3120 


222197.6 


94393109 


2684 


3136 


222197.6 


1915761H1 


2689 


2948 


222197.6 


92153774 


2702 


3136 


222197.6 


996350H1 


2729 


2980 


222197.6 


996350R1 


2729 


3028 


222197.6 


996350T1 


2729 


2992 




997484H1 


2731 


3034 




4801 76T6 


2736 


2990 




4801 76R6 


2736 


3028 


222197.6 


912187H1 


2744 


3042 


222197.6 


1313558H1 


2744 


3006 


222197.6 


9656198 


2763 


3141 


222197.6 


5057011 HI 


2784 


3089 


222197.6 


2260766H1 


2792 


3062 


222197.6 


93737413 


2«D3 


3139 


222197.6 


1686080H1 


2810 


3041 


222197.6 


9821623 


2338 


3148 


222197.6 


264161 3T6 


2833 


3094 


222197.6 


2328044H1 • 


2845 


3113 


222197.6 


94371777 


2859 


3141 


222197.6 


g 1516072 


2860 


3144 


222197.6 


92433045 


2865 


3091 


222197.6 


264&474H1 


2866 


3123 


222197.6 


2434653H1 


2871 


3069 


222197.6 


187e079H1 


2881 


3147 


222197.6 


94109641 


2896 


3139 


222197.6 


242851 4H1 


2909 


3098 


222197.6 


4146636H1 


2909 


3172 


222197.6 


4703838H1 


2929 


3139 


222197.6 


3125420H1 


2934 


3139 


222197.6 


5942277H1 


2971 


3137 


227709.3 


783646H1 


1577 


1867 


227709.3 


231421 1H1 


1585 


1834 
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wo 00/75298 



PCT/USOO/15344 



TABLE 4 



) ID NO: 


Tempiate ID 


Comoonent ID 


Starr 


StOD 


2 


227709.3 


342368H1 


1595 


1832 


2 


227709.3 


1 33324 1R6 


1595 


2004 


2 


227709.3 


•1833241 HI 


1595 


1853 


2 


227709.3 


154117H1 


1609 


1 752 


2 


227709.3 


4938186H1 


1617 


1905 


2 


227709.3 


4912871H1 


1628 


1918 


2 


227709.3 


2243937H1 


1653 


1871 


2 


227709.3 


876350H1 


1530 


1682 


2 


227709.3 


4959782H1 


1532 


1 785 


2 


227709.3 


1 839309H 1 


1663 


1974 


2 


227709.3 


1378304H1 


1695 


1940 


2 


227709.3 


1660925H1 


1696 


1936 


2 


227709.3 


1894917H1 


1696 


1912 


2 


227709.3 


23949 14H1 


1716 




2 


227709.3 


3035674H 1 


1714 


Iml 


2 


227709.3 


80857 8H1 


1714 




2 


227709.3 


30351 36H1 






2 


227709.3 


808578R1 


1714 




2 


227709.3 


3852902H1 






2 


227709.3 


3 122052 HI 


179Q 


907^ 

2075 


2 


227709.3 


6 1 07590H 1 


1700 


2046 


2 


227709.3 


5925422H 1 




2039 


2 


227709.3 


4378762H1 




2059 


2 


227709.3 


3781 168H1 




1958 


2 


227709.3 


2469542H1 


1763 




2 


227709.3 


4585384H 1 


1765 


2071 


2 


227709.3 


3772159H1 






2 


227709.3 


1436901 F6 


1787 




2 


227709.3 


1436902H] 


1787 


2081 


2 


227709.3 


1436902F1 


1787 




2 


227709.3 


732765H1 




2043 


2 


227709.3 


531886H1 


1796 




2 


227709.3 


732765R1 


1 796 


9^S0 


2 


227709.3 


323492H1 


1 796 


9079 


2 


227709.3 


1755142H1 


181 1 


9079 


2 


227709.3 


2D88594H1 


1817 




2 


227709.3 


1531927H1 


1820 


2036 


2 


227709.3 


1281210H1 


1828 


1963 


2 


227709.3 


618185H1 


1834 




2 


227709.3 


072066H1 


1833 


9079 


2 


227709.3 


920524H 1 


1837 


2173 


2 


227709.3 


g2030053 


1840 


2258 


2 


227709.3 


g681548 


1845 


2248 


2 


227709.3 


91190789 


1847 


2173 


2 


227709.3 


3052574H1 


1853 


2158 


2 


227709.3 


4546684H1 


1856 


1963 


2 


227709.3 


g 1846206 


1855 


2184 


2 


227709.3 


1231442H1 


1856 


2170 


2 


227709.3 


4546692H1 


1856 


1961 


2 


227709.3 


1231220H1 


1856 


2108 
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wo 00/75298 



PCT/US00/I5344 



TA3LE 4 



SEQ ID NO; 


Temolate ID 


Comoonent ID 


Start 


Stop 


2 


227709.3 


5693680H1 


1861 


2153 


2 


227709.3 


2040451 HI 


1861 


2194 


2 


227709.3 


5781387H1 


1861 


2129 


2 


227709.3 


3506145H1 


1861 


2181 


2 


227709.3 


3479633H1 


1859 


2214 


2 


227709.3 


2196589H1 


1861 


2123 


2 


227709.3 


3872090H1 


1867 


2086 


2 


227709.3 


1210943R1 


1867 


2217 


2 


227709.3 


072676H1 


1867 


2104 


2 


227709.3 


3120279H1 


1867 


2163 


2 


227709.3 


1210943H1 


1867 


2127 


2 


227709.3 


5877032H1 


1869 


2133 


2 


227709.3 


5469466H1 


1875 


2177 


2 


227709.3 


030331 HI 


1877 


2151 


2 


227709.3 


g 1970048 


1878 


2203 


2 


227709.3 


225701 2H1 


1883 


2139 


2 


227709.3 


705952H1 


1891 


2209 


2 


227709.3 


2564249H1 


1891 


2202 


2 


227709.3 


5712256H1 


1902 


2222 


2 


227709.3 


3798289H1 


1903 


2214 


2 


227709.3 


456952H1 


1903 


2171 


2 


227709.3 


693406H1 


1910 


2219 


2 


227709.3 


2403540H1 


1915 


2217 


2 


227709.3 


6095157H1 


1918 


2217 


2 


227709.3 


4257073H1 


1928 


2230 


2 


227709.3 


074494H1 


1936 


2178 


2 


227709.3 


073044H1 


1936 


2246 


2 


227709.3 


073608H1 


1936 


2217 


2 


227709.3 


5882532H1 


1937 


2217 


2 


227709.3 


073991 HI 


1936 


2227 


2 


227709.3 


073890H1 


1936 


2119 


2 


227709.3 


073335H1 


1936 


2162 


2 


227709.3 


5882935H1 


1938 


2217 


2 


227709.3 


588371 6H1 


1938 


2217 


2 


227709.3 


58812D8H1 


1939 


2217 


2 


227709.3 


58885 19H1 


1939 


2212 


2 


227709.3 


5890218m 


1939 


2212 


2 


227709.3 


4783188H1 


1938 


2225 


2 


227709.3 


2317709H1 


1941 


2222 


2 


227709.3 


1876721 HI 


1952 


2217 


2 


227709.3 


2469335H1 


1954 


2223 


2 


227709.3 


734056H1 


1953 


2076 


2 


227709.3 


2938267H1 


1957 


2217 


2 


227709.3 


3166569H1 


1957 


2217 


2 


227709.3 


4591368H1 


1966 


2227 


2 


227709.3 


2397855H1 


1966 


2240 


2 


227709.3 


6105204H1 


1965 


2217 


2 


227709.3 


874849H1 


1968 


2217 


2 


227709.3 


4458852H1 


1967 


2217 


2 


227709.3 


874849R1 


1968 


2621 
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wo 00/75298 



PCT/USOO/15344 



TABLE 4 



( ID NO: 


Temolate ID 


Component ID 


Start 


Stop 


2 


227709.3 


1896188H1 


1989 


2217 


2 


227709.3 


2470872T6 


1988 


2644 


2 


227709.3 


1831904H1 


1994 


2217 


2 


227709.3 


2433378H1 


2003 


2173 


2 


227709,3 


4893472H1 


2003 


2337 


2 


227709.3 


2395264H1 


2016 


2217 


2 


227709.3 


4203434H1 


2018 


2350 


2 


227709.3 


4886606H1 


2025 


2329 


2 


227709.3 


4886606F6 


2025 


2094 


2 


227709.3 


504991m 


2036 


2217 


2 


227709.3 


31 23928 HI 


2035 


2317 


2 


227709.3 


372978R6 


2539 


2691 


2 


227709.3 


23561 68H1 


2576 


2698 


2 


227709.3 


213804H1 


2632 


2687 


2 


227709.3 


3347292H1 


461 


695 


2 


227709.3 


5013953H1 


518 




2 


227709.3 


4327103H1 


527 


78? 


2 


227709.3 


3295045H1 


569 


81 1 


2 


227709.3 


4012389H1 


621 




2 


227709.3 


1377181F1 


637 


1043 


2 


227709.3 


13771 81 HI 


637 




2 


227709.3 


588391 2H1 






2 


227709.3 


5886832H1 


663 


097 


2 


227709.3 


5881250H1 


664 


CM-K 


2 


227709.3 


3773 79H 1 


687 


Q4A 


2 


227709.3 


2098327H1 




o/n 


2 


227709.3 


351 1206H1 


728 




2 


227709.3 


4031927H1 


736 


993 


2 


227709.3 


2448606H1 


738 


977 


2 


227709.3 


2473593T6 


743 




2 


227709.3 


2444327H 1 


744 


975 


2 


227709.3 


388684H 1 


774 




2 


227709.3 


924593 Rl 


778 


1 1'54 


2 


227709.3 


924593H 1 


778 


1044 


2 


227709.3 


270753 7T6 


791 




2 


227709.3 


184232316 • 


803 


1331 


2 


227709.3 


1 386726H I 


813 


1109 


2 


227709.3 


1436357H1 


816 


1076 


2 


227709.3 


1436357F1 


816 


1379 


2 


227709.3 


1842323H1 


818 


1009 


2 


227709.3 


1842323R6 


818 


1347 


2 


227709.3 


2757452H1 


862 


1137 


2 


227709.3 


33891 7H1 


874 


1099 


2 


227709.3 


g 1382744 


934 


1319 


2 


227709.3 


g2880866 


952 


1325 


2 


227709.3 


5289932H1 


954 


1212 


2 


227709.3 


736987R6 


967 


1219 


2 


227709.3 


92955000 


966 


1369 


2 


227709.3 


736987H1 


967 


1187 


2 


227709.3 


4792654H1 


971 


1250 
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TABLE 4 



SEQ ID NO: 


Template ID 


Component ID 


Start 


Stop 


2 


227709.3 


270359H1 


989 


1337 


2 


227709.3 


2532882H1 


985 


1263 


2 


227709.3 


3246655H1 


990 


1254 


2 


227709.3 


g2 106694 


1001 


1373 


2 


227709.3 


6210707H1 


1069 


1397 


2 


227709.3 


340933H1 


1086 


1262 


2 


227709.3 


46389 14Hi 


1094 


1344 


2 


227709.3 


6211794H1 


1106 


1397 


2 


227709.3 


1220278H1 


1148 


1408 


2 


227709.3 


47629 16H1 


1193 


1486 


2 


227709.3 


6208765H1 


1202 


1504 


2 


227709.3 


3167466H1 


1208 


1497 


2 


227709.3 


4541633H1 


1218 


1466 


2 


227709.3 


1389793H1 


1 


242 


2 


227709.3 


1221 361 HI 


146 


324 


2 


227709.3 


6210207H1 


1217 


1524 


2 


227709.3 


6208587H1 


1217 


1448 


2 


227709.3 


432063H1 


149 


454 


2 


227709.3 


2197686H1 


1241 


1389 


2 


227709.3 


3871923H1 


142 


426 


2 


227709.3 


2473593H1 


205 


438 


2 


227709.3 


014086H1 


1244 


1539 


2 


227709.3 


g 1809628 


1244 


1573 


2 


227709.3 


g570725 


1274 


151D 


2 


227709.3 


013903H1 


1285 


1535 


2 


227709.3 


3159468H1 


1288 


1590 


2 


227709.3 


5219130H1 


1303 


1575 


2 


227709.3 


2473593F6 


205 


346 


2 


227709.3 


862771 HI 


1304 


1579 


2 


227709.3 


336851 OH 1 


1323 


1609 


2 


227709.3 


2473008H1 


1340 


1591 


2 


227709.3 


3633133H1 


1353 


1666 


2 


227709.3 


2470872F6 


267 


453 


2 


227709.3 


2472485H1 


1367 


1615 


2 


227709.3 


3940725H1 


1371 


1561 


2 


227709.3 


1 83961 2H1 


1376 


1668 


2 


227709.3 


1839635H1 


1376 


1702 


2 


227709.3 


2440859H1 


1388 


1640 


2 


227709.3 


2560406H] 


1389 


1677 


2 


227709,3 


9776447 


1411 


1595 


2 


227709.3 


g 892952 


1432 


1807 


2 


227709.3 


4753851 HI 


1460 


1734 


2 


227709.3 


855862R1 


1460 


2074 


2 


227709.3 


855862H1 


1460 


1682 


2 


227709.3 


5436709H1 


1470 


1708 


2 


227709.3 


3872292H1 


1474 


1684 


2 


227709.3 


2470872H1 


267 


518 


2 


227709.3 


4738304H2 


363 


616 


2 


227709.3 


63461 3H1 


376 


613 


2 


227709.3 


489031 OH 1 


1474 


1746 
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TABLE A 



ID NO: 


Templare ID 


Component ID 


Starr 


Stop 


2 


227709.3 


964305H1 


396 


658 


2 


227709.3 


gl303518 


1483 


2011 


2 


227709.3 


4081986H1 


1497 


1695 


2 


227709.3 


2731973H1 


1501 


1745 


2 


227709.3 


964305R1 


396 


999 


2 


227709.3 


4984432H1 


398 


674 


2 


227709,3 


4323306H1 


407 


691 


2 


227709.3 


656683H1 


431 


693 


2 


227709.3 


656689H1 


431 


723 


2 


227709.3 


g 774837 


1512 


1815 


2 


227709.3 


9573087 


1512 


1840 


2 


227709.3 


303841 9H1 


1514 


1800 


2 


227709.3 


28309 14H1 


1515 


1788 


2 


227709.3 


5206976H2 


1530 


1806 


2 


227709.3 


073335T6 


2039 


2650 


2 


227709.3 


g3 173537 


2054 


2217 


2 


227709.3 


862087T1 


2058 


2649 


2 


227709.3 


862087H1 


2058 


2344 


2 


227709.3 


g 1955564 


2058 


2421 


2 


227709.3 


2469827T6 


2058 


2648 


2 


227709.3 


2395978H1 


2063 


2318 


2 


227709.3 


1 83324 1T6 


2074 


2648 


2 


227709.3 


1 226609H 1 


2075 


2353 


2 


227709.3 


1 43690 1T6 


2075 


2641 


2 


227709.3 


2827153H1 


2083 


2449 


2 


227709.3 


1878980H1 


2083 


2374 


2 


227709.3 


2473240T6 


2083 


2643 


2 


227709.3 


2431389H1 


2087 


2329 


2 


227709.3 


2400824H1 


2087 


2358 


2 


227709.3 


1 879688 F6 


2095 


2530 


2 


227709.3 


1879688t-n 


2095 


2389 


2 


227709.3 


1879688T6 


2096 


2653 


2 


227709.3 


2195425H1 


2098 


2400 


2 


227709.3 


g 196261 8 


2102 


2694 


2 


227709.3 


2325847 HI 


2112 


2374 


2 


227709.3 


4468886H1 


2112 


2414 


2 


227709.3 


3940725T6 


2118 


2654 


2 


227709.3 


2917539H1 


2119 


2418 


2 


227709.3 


1347063H1 


2130 


2371 


2 


227709.3 


55456SH1 


2137 


2387 


2 


227709.3 


323431 HI 


2148 


2442 


2 


227709.3 


450205H1 


2150 


2378 


2 


227709.3 


144a863Hl 


2150 


2423 


2 


227709.3 


24721 38T6 


2186 


2646 


2 


227709.3 


736987T6 


2206 


2648 


2 


227709.3 


g3932020 


2237 


2687 


2 


227709.3 


1676401 HI 


2247 


2475 


2 


227709.3 


406441 HI 


2259 


2522 


2 


227709.3 


334657H1 


2259 


2512 


2 


227709.3 


5888253H1 


2261 


2496 
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TABLE 4 



ID NO: 


Template ID 


Component ID 


Start 


Stop 


2 


227709.3 


3102430H1 


2261 


2530 


2 


227709.3 


6106355H1 


2261 


2524 


2 


227709.3 


93182016 


2261 


2693 


2 


227709.3 


604792H1 


2261 


2486 


2 


227709.3 


2428425H1 


2261 


2433 


2 


227709.3 


5883372H1 


2261 


2444 


2 


227709.3 


94264131 


2262 


2702 


2 


227709.3 


633186H1 


2262 


2544 


2 


227709.3 


3993266H1 


2261 


2540 


2 


227709.3 


201 7401 HI 


2263 


2427 


2 


227709.3 


2424057H1 


2268 


2536 


2 


227709.3 


404680H1 


2273 


2496 


2 


227709.3 


3153874H1 


2288 


2593 


2 


227709.3 


94291477 


2288 


2694 


2 


227709.3 


4061504H1 


2288 


2546 


2 


227709.3 


92968506 


2289 


2694 


2 


227709.3 


3123730H1 


2289 


2606 


2 


227709.3 


3123955H1 


2289 


2589 


2 


227709.3 


94115080 


2293 


2694 


2 


227709.3 


308633H1 


2295 


2534 


2 


227709.3 


2397634H1 


2297 


2495 


2 


227709.3 


94332177 


2302 


2694 


2 


227709.3 


94267824 


2303 


2694 


2 


227709.3 


308633F1 


2303 


2687 


2 


227709.3 


308633R1 


2303 


2687 


2 


227709.3 


2371 681 HI 


2309 


2551 


2 


227709.3 


2421774H1 


2309 


2547 


2 


227709.3 


91810042 


2311 


2687 


2 


227709.3 


93958397 


2313 


2677 


2 


227709.3 


4695290H1 


2316 


2582 


2 


227709.3 


2756565H1 


2320 


2630 


2 


227709.3 


9612404 


2323 


2687 


2 


227709.3 


2445854H1 


2327 


2587 


2 


227709.3 


449703H1 


2332 


2458 


2 


227709.3 


28771 81 HI 


2334 


2622 


2 


227709.3 


1211271R1 


2334 


2687 


2 


227709.3 


1211271T1 


2334 


2649 


2 


227709.3 


12n27]Hl 


2334 


2608 


2 


227709.3 


9616409 


2341 


2660 


2 


227709.3 


2017458H1 


2340 


2619 


2 


227709.3 


1534375H1 


2342 


2572 


2 


227709.3 


3145020H1 


2344 


2683 


2 


227709.3 


1539734H1 


2358 


2600 


2 


227709.3 


37861 74H1 


2363 


2661 


2 


227709.3 


1 454741 Fl 


2367 


2687 


2 


227709.3 


1454741 HI 


2367 


2632 


2 


227709.3 


271 7951 HI 


2372 


2548 


2 


227709.3 


1 35981 6H1 


2374 


2618 


2 


227709.3 


1 35981 6F1 


2374 


2694 


2 


227709.3 


9564645 


2385 


2694 
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TABLE 4 



SEQ ID NO: 


Template ID 


Component ID 


Starr 


Stop 


2 


227709.3 


91154316 


2388 


2698 


2 


227709.3 


9891561 


2393 


2707 


2 


227709.3 


24733 12H1 


2393 


2648 


2 


227709.3 


92992779 


2394 


2643 


2 


227709.3 


1218580T6 


2395 


2649 


2 


227709.3 


1218580T1 


2395 


2649 


2 


227709.3 


9824343 


2396 


2719 


2 


227709.3 


1218580R6 


2395 


2691 


2 


227709.3 


1218573H1 


2395 


2641 


2 


227709.3 


286961 HI 


2403 


2690 


2 


227709.3 


4367190H1 


2408 


2684 


2 


227709.3 


23991 22H1 


2420 


2672 


2 


227709.3 


5882896H1 


2423 


2687 


2 


227709.3 


9967282 


2422 


2700 


2 


227709.3 


5883904H1 


2423 


2687 


2 


227709.3 


5883908H1 


2423 


2687 


2 


227709.3 


5882375H1 


2424 


2567 


2 


227709.3 


gll91305 


2429 


2710 


2 


227709.3 


794290H1 


2437 


2664 


2 


227709.3 


520742H1 


2442 


2678 


2 


227709.3 


9646174 


2443 


2687 


2 


227709.3 


1538631 HI 


2443 


2651 


2 


227709.3 


1722285H1 


2444 


2677 


2 


227709.3 


91202716 


2469 


2701 


2 


227709.3 


86290371 


2482 


2648 


2 


227709.3 


095638H1 


2484 


2694 


2 


227709.3 


862903R1 


2484 


2694 


2 


227709.3 


3124846H1 


2490 


2692 


2 


227709.3 


5906470H1 


2497 


2691 


2 


227709.3 


2535162H1 


2525 


2651 


2 


227709.3 


4460277H1 


2533 


2694 


2 


227709.3 


372978T6 


2539 


2649 


2 


227709.3 


372978H1 


2539 


2686 


3 


237703.2 


91963754 


1 


374 


3 


237703.2 


91137733 


95 


407 


3 


237703.2 


9843567 


124 


414 


3 


237703.2 


3070350H1 


189 


477 


3 


237703.2 


3070350F6 


189 


709 


3 


237703.2 


1439542H1 


413 


686 


3 


237703.2 


3203352H1 


437 


711 


3 


237703.2 


92013304 


598 


954 


3 


237703.2 


3799002H1 


701 


1010 


3 


237703.2 


044160T6 


958 


1453 


3 


237703.2 


824258H1 


1020 


1254 


3 


237703.2 


9189439^ 


1031 


1472 


3 


237703.2 


3491432H1 


1052 


1315 


3 


237703.2 


2601554F6 


1059 


1602 


3 


237703.2 


2601554H1 


1060 


1336 


3 


237703.2 


4617816H1 


1073 


1337 


3 


237703.2 


4058186H1 


nil 


1197 
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TABLE C 



ID NO- 


Temolate ID 


Comoonent ID 


Starr 


Stop 


3 


237703.2 


5554248H1 


1145 


1355 


3 


237703.2 


5554148H1 


1145 


1386 


3 


237703.2 


g3594366 


1199 


1611 


3 


237703.2 


5122547H1 


1278 


1516 


3 


237703.2 


2278690H1 


1379 


1654 


3 


237703.2 


2278690R6 


1379 


1879 


3 


237703.2 


5564683H1 


1402 


1658 


3 


237703.2 


9505 19H1 


1436 


1678 


3 


237703.2 


95051 9R6 


1436 


1723 


3 


237703.2 


91123529 


1437 


1805 


3 


237703.2 


5260937 HI 


1514 


1744 


3 


237703.2 


3386649H1 


1527 


1744 


3 


237703.2 


5272970H1 


1532 


1780 


3 


237703.2 


3808872H1 


1533 


1838 


3 


237703.2 


9505 19T6 


1554 


2019 


3 


237703.2 


319252H1 


1592 


1986 


3 


237703.2 


4411457H1 


1669 


1947 


3 


237703.2 


g 1933240 


1669 


2147 


3 


237703.2 


2601554T6 


1698 


2313 


3 


237703.2 


824257T6 


1699 


2313 


3 


237703.2 


2859138T6 


1776 


2312 


3 


237703.2 


1872409F6 


1788 


2150 


3 


237703.2 


1872409H1 


1788 


2062 


3 


237703.2 


1572418H1 


1798 


1997 


3 


237703.2 


1872409T6 


1805 


2312 


3 


237703.2 


530381 HI 


1810 


1963 


3 


237703.2 


5583255H1 


1811 


2075 


3 


237703.2 


126942H1 


1816 


2025 


3 


237703.2 


2278690T6 


1829 


2311 


3 


237703.2 


121198481 


1832 


2066 


3 


237703.2 


2703527H1 


1832 


2106 


3 


237703.2 


3253906H1 


1885 


2159 


3 


237703.2 


g3934221 


1896 


2349 


3 


237703.2 


1620273H1 


1897 


2117 


3 


237703.2 


g28 19399 


1908 


2351 


3 


237703.2 


93895924 


1936 


2349 


3 


237703.2 


92881190 


1962 


2270 


3 


237703.2 


040587H1 


1971 


2158 


3 


237703.2 


93147053 


1973 


2349 


3 


237703.2 


9843522 


2009 


2349 


3 


237703.2 


91844904 


2023 


2349 


3 


237703.2 


92881790 


2024 


2349 


3 


237703.2 


92820075 


2030 


2349 


3 


237703.2 


92237723 


2051 


2350 


3 


237703.2 


2385 12H1 


2085 


2313 


3 


237703.2 


292954H1 


2182 


2320 


3 


237703.2 


92013921 


2238 


2522 


3 


237703.2 


g 1980268 


2386 


2742 


4 


240091.1 


2898155H1 


1 


289 


4 


240091.1 


2434264H1 


3 


215 
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TABLr 4 



HDNO' 


TemDiare ID 


Comoonent ID 


Start 


Stop 


4 


240091.1 


5075278H1 


3 


127 


4 


240091.1 


2647594H1 


1720 


1770 


4 


240091.1 


4785026H1 


1803 


2073 


4 


240091.1 


4785001 HI 


1803 


2069 


4 


240091.1 


3600323H1 


1258 


1557 


4 


240091.1 


24482 18F6 


1267 


1485 


4 


240091.1 


244821 8H1 


1267 


1508 


4 


240091.1 


1391961T6 


1267 


1730 


4 


240091.1 


083349H1 


1309 


1464 


4 


240091.1 


070643H1 


1309 


1543 


4 


240091.1 


3712971T6 


1317 


1744 


4 


240091.1 


3399693H1 


1384 


1606 


4 


240091.1 


93000643 


1400 


1562 


4 


240091.1 


93659260 


1446 


1772 


4 


240091.1 


3491102H1 


1447 


1559 


4 


240091.1 


2286846H1 


1496 


1696 


4 


240091.1 


45751 30H1 


1513 


1756 


4 


240091.1 


2584803H1 


1539 


1770 


4 


240091.1 


2584803F6 


1539 


1770 


4 


240091.1 


4399541 HI 


1590 


1833 


4 


240091.1 


489490H1 


1599 


1844 


4 


240091.1 


5541073H1 


1605 


1804 


4 


240091.1 


797486H1 


1609 


1772 


4 


240091.1 


2435453H1 


3 


231 


4 


240091.1 


2434264R6 


3 


491 


4 


240091,1 


5279 14H1 


4 


275 


4 


240091.1 


4382936H1 


10 


241 


4 


240091.1 


4210908H1 


20 


292 


4 


240091.1 


3615238H1 


47 


340 


4 


240091.1 


3615238F6 


47 


528 


4 


240091.1 


2733107H1 


54 


275 


4 


240091.1 


494380H1 


62 


307 


4 


240091.1 


1391 961 F6 


66 


475 


4 


240091.1 


1391961H1 


66 


318 


4 


240091.1 


5801 12H1 


359 


558 


4 


240091.1 


1232706F6 


389 


842 


4 


240091.1 


1232706H1 


389 


629 


4 


240091.1 


3487133H1 


432 


698 


4 


240091.1 


g4244249 


51 1 


981 


4 


240091.1 


4476 19H1 


580 


799 


4 


240091.1 


578221 4H1 


670 


964 


4 


240091.1 


4913546F6 


830 


1249 


4 


240091.1 


4913546H1 


830 


1108 


4 


240091.1 


38921 11 HI 


834 


1130 


4 


240091.1 


4742244H1 


836 


1102 


4 


240091.1 


g 1484624 


867 


1316 


4 


240091.1 


2376485F6 


901 


1205 


4 


240091.1 


2376485H1 


901 


1124 


4 


240091.1 


2376485T6 


902 


1167 


4 


240091.1 


1849607H1 


907 


1198 
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TABLE 4 



ID NO. 


Template ID 


Component ID 


Starr 




4 


240091.1 


4030992H 1 


952 




4 


240091.1 


2764886H1 


975 




4 


240091 .1 


4665206H 1 


1 1 38 


401 


4 


240091 .1 


I -lO^/ L/O 1 O 






4 


240091 .1 


2434264T6 


1 210 


]7A 


4 


240091 .1 


442431 SHI 




ilso 
1459 


5 


243096.6 


y o^^ov 1 y 


innA 


1482 


5 


243096.6 


yo^y/o 1 o 


iniA 


1479 


5 


243096.6 


yovoo/uo 


inifl 


1479 


5 


243096.6 




inoo 


1487 


5 


243096.6 


i UOU/ r n 1 


imn 


1 104 


5 


243096.6 


/ OQZon 1 


imo 
in ^ 


1 183 


5 


243096 6 


i-i/i/m7/i/i7 


1045 


1481 


5 


243096 6 


y4ii.i:<io^i} 


ln7^ 


1480 


5 


243096.6 


A'^ 1 OnODA 


1076 


1478 


5 




go 1 1 ooVo 


1075 


1478 


5 


O/IinOA A 
.^^oUVO.O 


g2577165 


1076 


1480 


^ 




1 OOVOZ 


1078 


1493 




ZAOUVO.O 


g2063697 


1079 


1482 


5 


94'^nOA A 


/:^.^UOzn 1 


1079 


1215 




z4oUVO.O 


222052F1 


1078 


1479 




Q/nnoA A 


222052 R1 


1078 


1479 




243096.6 


g3737532 


1083 


1509 




o/iQnoA A 


1 o4y/<i4l6 


1095 


1440 






g4n0131 


1 1 15 


1481 




OA'inOA A 

/:*louyo.o 


A*a T ormTA 


1 1 16 


1438 




OylOnOA A 


2446727T6 


1 1 19 


1438 




0/1'?nOA A 


g3 155321 


1 129 


1475 




O.^TnOA A 


g31 781 76 


1 148 


1494 




O^'^nOA A 


9481 77H 1 


1 149 


1428 


5 


O/nnoA A 


y4o 1 / / K 1 


1 149 


1488 




0/l"3nOA A 


19421 76H1 


1 163 


1441 


5 


243096 6 


1 O/m VADA 

1 y^tiC 1 / oko 


1 163 


1418 


5 


Oyl'^nOA A 


1 OAO 1 AQLJ 1 

i v^^: 1 OOn 1 


1 163 


1440 


5 


o^-jnOA A 

Z*40U70.0 


Ooo4 1 oon 1 


1 164 


1433 


5 


o/ionoA A 
^Mouyo.o 


lOO 1410 


1209 


1430 


5 


O/j-snOA A 
^**ouyo.o 


1/11 neil ylU 1 
14 1 oo 14n 1 


1216 


1431 


5 




1 4 1 oo04n 1 


1216 


1468 


5 




1/11 fli;i >icA 
14 1 oo i4rO 


1216 


1479 


5 




A'JOQyiOLJl 

OvJ^o4on 1 


1218 


1458 


5 




/-I /1 1 nT7i 1 
g4 lU// 1 1 


1220 


1526 


5 


243096.6 


g4457962 


1227 




5 


243096.6 


g2837785 


1228 


1479 


5 


243096.6 


g8 19991 


1238 


1496 


5 


243096.6 


g564440 


1237 


1488 


5 


243096.6 


g8 16379 


1251 


1540 


5 


243096.6 


g885380 


1252 


1488 


5 


243096.6 


g768804 


1261 


1481 


5 


243096.6 


6093263H1 


1263 


1492 


5 


243096,6 


g645318 


1286 


1488 



69 
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TABLE 4 



ID NO: 


Template ID 


CorrsDonent ID 


Start 


Stop 


5 


243096.6 


g566867 


1292 


1526 


5 


243096.6 


g81631] 


1302 


1679 


5 


243096.6 


9671079 


1296 


1488 


5 


243096.6 


92219072 


1297 


1472 


5 


243096.6 


92539665 


1300 


1479 


5 


243096.6 


9670466 


1300 


1526 


5 


243096.6 


2474482H1 


1313 


1542 


5 


243096.6 


Q4328047 


1314 


1487 


5 


243096.6 


92205935 


1335 


1454 


5 


243096.6 


9832021 


1367 


1678 


5 


243096.6 


92205936 


1375 


1472 


5 


243096.6 


92789365 


1393 


1479 


5 


243096.6 


9873007 


1418 


1540 


5 


243096.6 


9900098 


1419 


1488 


5 


243096.6 


9567639 


1424 


1667 


5 


243096.6 


191S362H1 


1473 


1731 


5 


243096.6 


472761 IHl 


1576 


1854 


5 


243096.6 


9822245 


1 648 


1976 


5 


243096.6 


gg 12869 


1651 




5 


243096.6 


9830918 


1654 


201 2 


5 


243096.6 


141461 2H1 






5 


243096.6 


4761250H1 


1662 


1939 


5 


243096.6 


9678372 


1665 


1972 


5 


243096.6 


g561207 


1665 


1955 


5 


243096.6 


92002379 


1665 


2002 


5 


243096.6 


9709471 


1665 


1866 


5 


243096.6 


4761242H1 


1664 




5 


243096.6 


9518391 


1678 


1933 


5 


243096.6 


4595686H1 


1707 




5 


243096.6 


151 1493H1 


1792 


1996 


5 


243096.6 


151 1493F6 


1792 




5 


243096.6 


1512376H1 


1792 


2009 


5 


243096.6 


g2003356 


1922 




5 


243096.6 


4697285H1 


2203 


2446 


5 


243096.6 


4941432m 


2381 


2673 


5 


243096.6 


1230891 HI 


2413 


2508 


5 


243096.6 


1522037H1 


2421 


2625 


5 


243096.6 


3749969H1 


2502 


2799 


5 


243096.6 


2125142H1 


2527 


2795 


5 


243096.6 


2125142F6 


2627 


2841 


5 


243096.6 


121562H1 


2620 


2806 


5 


243096.6 


856668H1 


2745 


2933 


5 


243096.6 


5882521 HI 


2758 


3030 


5 


243096.6 


5888582H1 


2759 


2976 


5 


243096.6 


5882569H1 


2760 


3030 


5 


243096.6 


9775350 


2793 


3137 


5 


243096.6 


9705857 


2790 


3138 


5 


243096.6 


92002380 


2803 


3138 


5 


243096.6 


5927949H1 


2845 


3140 


5 


243096.6 


1511493T6 


2883 


3500 



70 
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TABLE 4 



3 ID NO: 


TerriDlate ID 


ComDonent ID 


Star: 


Stop 


5 


243096.6 


13353nHl 


2951 


3205 


5 


243096.6 


1613745H1 


2965 


3179 


5 


243096.6 


3472823H1 


3005 


3245 


5 


243096.6 


g570224 


3048 


3318 


5 


243096.6 


g4095588 


3134 


3545 


5 


243096.6 


9831152 


3143 


3366 


5 


243096.6 


94286632 


3205 


3476 


5 


243096.6 


5907720H1 


3208 


3501 


5 


243096.6 


94187457 


3286 


3557 


5 


243096.6 


93842315 


3295 


3465 


5 


243096.6 


94005713 


3303 


3465 


5 


243096.6 


94006389 


3305 


3465 


5 


243096.6 


94006377 


3305 


3559 


5 


243096.6 


94006150 


3305 


3537 


5 


243096.6 


94006070 


3315 


3542 


5 


243096.6 


94187003 


3315 


3554 


5 


243096.6 


94188554 


3315 


3537 


5 


243096.6 


94006771 


3315 


3537 


5 


243096.6 


94072007 


3316 


3542 


5 


243096.6 


94017934 


3316 


3537 


5 


243096.6 


94150328 


3316 


3465 


5 


243096.6 


94005644 


3316 


3537 


5 


243096.6 


5840086H1 


3345 


3553 


5 


243096.6 


5289394H1 


3472 


3737 


5 


243096.6 


9710217 


3508 


3789 


5 


243096.6 


9694295 


3619 


3781 


5 


243096.6 


92206232 


3623 


3794 


5 


243096.6 


92206104 


3656 


3795 


5 


243096.6 


28972 15H1 


1 


249 


5 


243096.6 


3541808H1 


181 


397 


5 


243096.6 


2352032H1 


32 


249 


5 


243096.6 


2446727F6 


44 


104 


5 


243096.6 


3123367H1 


44 


356 


5 


243096.6 


4385825H1 


181 


379 


5 


243096.6 


2446727H1 


44 


308 


5 


243096.6 


2905666H1 • 


45 


326 


5 


243096.6 


276761 6H1 


46 


308 


5 


243096.6 


4521271H1 


214 


473 


5 


243096.6 


1725750H] 


47 


209 


5 


243096.6 


g 1965606 


235 


621 


5 


243096.6 


3n7919Hl 


47 


328 


5 


243096.6 


2762827H1 


49 


309 


5 


243096.6 


5395762H1 


245 


510 


5 


243096.6 


5585677H1 


251 


484 


5 


243096.6 


3416289H] 


253 


507 


5 


243096.6 


5407275H1 


257 


511 


5 


243096.6 


5407149H1 


257 


520 


5 


243096.6 


3452689H1 


49 


240 


5 


243096.6 


4819033H1 


292 


515 


5 


243096.6 


483458H1 


50 


302 
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TABLE 4 



SEQ ID NO. 


Template ID 


ComDonent ID 


Starr 


StOD 


5 


243096.6 


1 94 ] 753H 1 


291 


537 




243096.6 


485741 HI 


50 


296 


5 


243096.6 


g 7667 17 


307 


480 


5 


243096.6 


2215901 HI 


51 


147 


5 


243096.6 


2990593H1 


73 


383 


5 


243096.6 


4017755H1 


76 


378 


5 


243096.6 


25584 16H1 


81 


363 


5 


243096.6 


870507R1 


82 


682 


5 


243096.6 


870507H1 


82 


339 


5 


243096.6 


3692401 HI 


82 


285 


5 


243096.6 


2387055H1 


83 


336 


5 


243096.6 


1641267H1 


329 


555 


5 


243096.6 


2483682H1 


84 


331 


5 


243096.6 


4977750H1 


89 


382 


5 


243096.6 


g4244257 


343 


"7^ 


5 


243096.6 


3165660H1 






5 


243096.6 


4043243 HI 


89 


dnl 


5 


243096.6 


3 500940 H 1 






5 


243096.6 


3580985 H 1 


^9^ 


415 


5 


243096.6 


1 m AO'^^'^PA 
1 o ! oyooro 




405 


5 




go/ zoo^: 


oo 


^\t. 


5 






09 


418 




o/nnoA A 






624 


5 


243096.6 


zoo J / oon 1 






5 


243096.6 


nA79ft/l'^ 
yu/ zo*4o 


99 


^zl 


5 


243096.6 


790680R1 


2^ 


07ft 


5 


243096.6 


790680H1 


365 


^fl4 


5 






0^ 


7m 


5 


243096.6 


1518953H1 




9fln 


5 


243096.6 


•3:^1 n'^'^ SHI 






5 


243096.6 


4401 867H1 




653 


5 


243096 6 


1 AO/i97Al-n 
loz^z/on 1 




583 


5 


243096.6 


3099469H 1 


09 




5 


243096.6 


oft? "^107 




ARA 


5 


243096.6 


g874944 


93 


^09 


5 


243096.6 


2201 243 HI 


41 1 


A67 


5 


243096.6 


4907323H2 


97 


^77 


5 


243096.6 


2215590H1 


421 




5 


243096.6 


1 647 1 05H 1 


102 




5 


243096.6 


3337242H 1 


105 




5 


243096.6 


3328567H 1 


421 


709 


5 


243096.6 


5165830H1 


113 


391 


5 


243096.6 


1919378R6 


432 


865 


5 


243096.6 


2078775H1 


114 


391 


5 


243096.6 


1919378H1 


432 


700 


5 


243096.6 


1798353H1 


115 


371 


5 


243096.6 


5109893H1 


447 


675 


5 


243096.6 


3581083H1 


116 


378 


5 


243096.6 


2202470m 


456 


711 


5 


243096.6 


1642210H1 


461 


676 



72 
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TABLE 4 



ID NO; 


Template ID 


ComDonenf ID 


Start 


Stop 


5 


243096.6 


282881 7H1 


122 


399 


5 


243096.6 


1 642206H ] 


461 


676 


5 


243096.6 


g669309 


136 


449 


5 


243096.6 


493261 6H1 


463 


618 


5 


243096.6 


g571269 


136 


492 


5 


243096.6 


1753890H1 


464 


690 


5 


243096.6 


75621 7H1 


464 


706 


5 


243096.6 


3030389H1 


464 


764 


5 


243096.6 


g677690 


136 


462 


5 


243096.6 


1754027H1 


464 


704 


5 


243096.6 


3893562H1 


139 


449 


5 


243096.6 


g885379 


139 


482 


5 


243096.6 


026083H1 


509 


692 


5 


243096.6 


2758492H1 


143 


413 


5 


243096.6 


836506R1 


513 


1092 


5 


243096.6 


5294950H1 


147 


395 




.£iAoUVO.O 


OoOOUOrl 1 






5 


243096 6 


1 ozuzoon I 


517 


685 




O/l'^nOA A 


1 /oUo t n 1 








0/1 "^nOA A 

ZAoUvo.o 


D 1 y/ttzUHz 




oln 




OA "^nOA A 


g,/z4Uyyo 








0/1'?riOA A 


g 766746 


1A1 

]z 


fi^ 




^^ouyo.o 




iln 








g677072 




Q^n 






ZOZO 1 lOlO 


S74 






243096 6 


nOR 1 AAAA 


612 


fl74 




Oyt^nOA A 


/1C'5A'i^OU 1 








Oyl'^nOA A 


OA7n7'^7M 1 




R6R 




O/linOA A 


ylOO/IOOflU 1 

ityytiz^on i 


A07 






O/i-jnOA A 


70'?^OAU1 

/yooyon i 


AQ7 






O/nnoA A 


^-iOnt;QOA'5 

g^uooyoo 




0/I1 




O/'inOA A 


1 RAnAnoui 
1 oouou.<in 1 


710 


04ft 




O/l'^nOA A 


1 ooooouri 1 


710 


fiOfi 




o/nnoA A 


r-ioncaoAA 
g^uoOoOo 


7A^ 


O-iA 




O/l'^nOA A 


1 A/1/10UI 1 

t OA^yn 1 




IfW^ 




o/i-anOA A 
zAouyo.o 


AaAACjAMT 


771 


1028 


5 


243096 6 


O/ ZOO.<lOI i 


783 


1432 


5 


243096.6 


3659224H2 


789 


1073 


5 


243096.6 


6394 1 1 H 1 






5 


243096.6 


5332257 HI 


820 


1063 


5 


243096.6 








5 


243096.6 


2652025T6 


859 


1424 


5 


243096.6 


96187916 


859 


1437 


5 


243096.6 


50881 78T6 


858 


1465 


5 


243096.6 


1849724F6 


860 


1441 


5 


243096.6 


1849724H1 


860 


1135 


5 


243096.6 


5395762T1 


884 


1440 


5 


243096.6 


1919378T6 


880 


1452 


5 


243096.6 


2663785H1 


891 


1144 


5 


243096,6 


362501 2H1 


904 


1051 



73- 
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TABLE 4 



SEQ ID NO: 


Template ID 


Component ID 


Start 


Stop 


5 


243096.6 


2683C22H1 


931 


1212 


5 


243096.6 


3499439H1 


933 


1219 


5 


243096.6 


4005888H1 


935 


1211 


5 


243096.6 


1682469T7 


939 


1485 


5 


243096.6 


2351530H1 


941 


1130 


5 


243096.6 


92063947 


951 


1207 


5 


243096.6 


I668866H1 


960 


1201 


5 


243096.6 


1667633H1 


960 


1220 


5 


243096.6 


92358923 


963 


1060 


5 


243096.6 


3085780H1 


971 


1082 


5 


243096.6 


9814507 


161 


443 


5 


243096.6 


9830479 


161 


470 


5 


243096.6 


9816410 


162 


519 


6 


244366.6 


1889554H1 


493 


750 


6 


244366.6 


1889554F6 


493 


939 


6 


244366.6 


4298324H1 


1 


255 


6 


244366.6 


853003H1 


12 


225 


6 


244366.6 


92178494 


517 


912 


6 


244366.6 


853003R6 


26 


483 


6 


244366.6 


3327565H1 


555 


787 


6 


244366.6 


2263295H1 


278 


532 


6" 


244366.6 


5401026H1 


604 


816 


6 


244366.6 


1285225H1 


606 


862 


6 


244366.6 


2674162H1 


661 


904 


6 


244366.6 


3101288H1 


295 


585 


6 


244366.6 


32951 39H1 


815 


1056 


6 


244366.6 


6002940H1 


886 


1 170 


6 


244366.6 


3101288F6 


295 


694 


6 


244366.6 


6002740H1 


904 


1 170 


6 


244366.6 


3246058F6 


941 


1282 


6 


244366.6 


3246058H1 


941 


1 192 


6 


244366.6 


3887233H1 


959 


1240 


6 


244366.6 


2431320H1 


972 


1 192 


6 


244366.6 


1513444H1 


978 


1 189 


6 


244366.6 


2813740H1 


1071 


1363 


6 


244366.6 


2815664H1 • 


1071 


1274 


6 


244366.6 


2813707H1 


1071 


1359 


6 


244366.6 


3492628H1 


1138 


1414 


6 


244366.6 


2183893H1 


1190 


1450 


6 


244366.6 


5641164H1 


1238 


1485 


6 


244366.6 


580082H1 


1239 


1487 


6 


244366.6 


3155135H1 


1254 


1487 


6 


244366.6 


30754 16H1 


1282 


1565 


6 


244366.6 


3559024H1 


1351 


1639 


6 


244366.6 


3451987H1 


1525 


1785 


6 


244366.6 


4378692H1 


1597 


1817 


6 


244366.6 


92162961 


1742 


2237 


6 


244366.6 


3890528H1 


1764 


1919 


6 


244366.6 


5017346H1 


3006 


3272 


6 


244366.6 


1690531 HI 


2938 


3105 



74 



wo 00/75298 
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TABLE ^ 



ID NO. 


Templots iD 


ComDonent ID 


Start 


Stop 


6 


244366.6 


5S90083H1 


3007 


3133 


6 


244366.6 


4614333H1 


3009 


3145 


(5 


244366.6 


27831 17H1 


2979 


3239 




244366.6 


3037603H1 


3012 


3292 


5 


244366.6 


032789H1 


3013 


3234 






Of nz 


3029 


3291 


5 


244366.6 


2626903 HI 


3031 


3283 




244366.6 


1403785H1 


3034 


3328 




244366.6 




2980 


3254 






uuo / z/ n 1 


3043 


3400 






uwo 1 / on I 




3543 






uwooo^n 1 


3043 


3424 




O/ynAA A 


uuo 1 c5on 1 








O/yl'^AA A 


nm.1 HQUi 
uuo 1 ozn I 








0//1'5AA A 


uuo/u 1 n 1 


3043 


34/1 




244366.6 


uUoO 1 on 1 




3429 




244366.6 


00*3 1 C QLJ 1 








O/zl'^AA A 
ZAAoOO.O 


UUo 1 Z/n 1 


3043 


3453 




244366.6 


UU>J4oon 1 


3043 


3433 


6 


244366.6 


0034 22H 1 


3043 


3380 




244366.6 


OZvU6 1 /H 1 


3002 


3300 


6 


244366.6 


003521 HI 


3043 


3549 




244366.6 


003294 HI 


3043 


341 1 




244366.6 


003642H 1 


3043 


3400 




244366.6 


nr»QAy1ALJ 1 

mJo04oH 1 


3043 


3405 


6 


244366.6 


003660H 1 


3043 


3392 




244366.6 


0 1 ooo 1 zn 1 










no/iAnRui 1 
uyAouon 1 


3081 


3258 




244366.6 


1 7266G2H1 












^lOO 






244366.6 


zoo 1 vJDUH I 


^nn 






0/1/1 "^AA A 


/Ooz/On 1 


3130 






244366.6 


o^VOzO 1 n 1 








O/lyl'iAA A 


o/uo/'4zn 1 




^^71 








3 75 


^71'=> 




244366.6 


7 1 m OQOTA 
o 1 U 1 ZdoIO 


^9ni 








OOO 1 oozn 1 




^A^9 




0.^/1 '?AA A 


n7y17'^/1U1 

u/'i/o'4n 1 


?9n9 

3^9 


340^ 






n7'5 "3001-11 








O/I/I'^AA A 


lOTt^/t/llTA 


^944 


^41A 




Z44000.0 


O**/ / / ^iMri 1 










91 1 9'=190Ti*i 
Z 1 1 £Sj£rf 1 0 


3250 


3723 


6 


244366.6 


g3933445 


3274 


3756 


6 


244366.6 


4861989H1 


3274 


3567 


6 


244366.6 


1737024F6 


3292 


3734 


6 


244366.6 


g2584374 


3285 


3757 


6 


244366.6 


1735490H1 


3292 


3561 


6 


244366.6 


1737D24H1 


3292 


3554 


6 


244366.6 


393035H1 


3303 


3590 


6 


244366.6 


21 58031 F6 


3307 


3758 



75 
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TABLE 4 



HD NO: 


Temolate ID 


Component ID 


Start 


Stop 


6 


244366.6 


g4438605 


3308 


3758 


6 


244366.6 


2554055H1 


3325 


3624 


6 


244366.6 


92934389 


3326 


3756 


6 


244366.6 


94391414 


3341 


3756 


6 


244366.6 


4460645H 1 


3349 


3601 


6 


244366.6 


92208607 


3359 


3756 


6 


244366.6 


92318342 


3366 


3756 


6 


244366.6 


g 1062585 


3370 


3743 


6 


244366.6 


94081771 


3372 


3761 


6 


244366.6 


91925212 


3381 


3757 


6 


244366.6 


4726758H1 


3390 


3662 


6 


244366.6 


92410378 


3392 


3763 


6 


244366.6 


8674 19H1 


3398 


3679 


6 


244366.6 


9616070 


3402 


3764 


6 


244366.6 


92163418 


3405 


3756 


6 


244366.6 


92555602 


3413 


3760 


6 


244366.6 


9561365 


3428 


3756 


6 


244366.6 


1 889554T6 


3433 


3725 


6 


244366.6 


92336915 


3444 


3756 


6 


244366.6 


g6161 15 


3445 


3756 


6 


244366.6 


94435130 


3469 


3756 


6 


244366.6 


94525507 


3502 


3757 


6 


244366.6 


2158031 HI 


3523 


3756 


6 


244366.6 


g2401624 


3528 


3765 


6 


244366.6 


94268526 


3559 


3756 


6 


244366.6 


2009370H1 


3666 


3756 


6 


244366.6 


218073H1 


3682 


3756 


6 


244366.6 


2350763H1 


3699 


3760 


6 


244366.6 


1647483H1 


2520 


2769 


6 


244366.6 


2432662H1 


2526 


2757 


6 


244366.6 


901941 R1 


2538 


3096 


6 


244366.6 


901941H1 


2538 


2895 


6 


244366.6 


901 981 HI 


2538 


2858 


6 


244366.6 


2052263H1 


2545 


2857 


6 


244366.6 


3483573H1 


2553 


2851 


6 


244366.6 


3565807H1 


2558 


2822 


6 


244366.6 


92278841 


2560 


2917 


6 


244366.6 


, 92178439 


2563 


2917 


6 


244366.6 


92153824 


2565 


2917 


6 


244366.6 


91329145 


2568 


2874 


6 


244366.6 


2198515H1 


2647 


2908 


6 


244366.6 


2200581 HI 


2647 


2725 


6 


244366.6 


91548506 


2680 


3207 


6 


244366.6 


2321185H1 


2694 


2917 


6 


244366.6 


324407 1T6 


2568 


2798 


6 


244366.6 


2936492H1 


2700 


2917 


6 


244366.6 


600642H1 


2586 


2891 


6 


244366.6 


91124072 


2711 


2850 


6 


244366.6 


91833465 


2712 


2856 


6 


244366.6 


94327019 


2736 


2851 
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TABLE 4 



SEQ ID NO 


Template ID 


Component ID 


Stan 


Stop 


6 


244366.6 


3165778H1 


2588 


2925 


6 


244366.6 


g2139164 


2594 


2835 


6 


244366.6 


633565H1 


2753 


2917 


6 


244366.6 


1520269F6 


2766 


3147 


6 


244366.6 


1520026H1 


2766 


2917 


6 


244366.6 


1520269H1 


2766 


2917 


6 


244366.6 


g4095144 


2611 


2917 


6 


244366.6 


1237708H1 


2767 


3016 


6 


244366.6 


2768411 HI 


2770 


3013 


6 


244366.6 


gl321416 


2615 


2858 


6 


244366.6 


2416809H1 


2800 


2917 


6 


244366.6 


1570846H1 


2811 


3018 


6 


244366.6 


389811 4H1 


2621 


2859 


6 


244366.6 


874422H1 


2824 


3131 


6 


244366.6 


91349372 


2825 


2952 


6 


244366.6 


2815235H1 


2842 


3116 


6 


244366.6 


4897079H1 


2925 


3201 


6 


244366.6 


1231478H1 


2632 


2861 


6 


244366.6 


2152887H1 


2927 


3042 


6 


244366.6 


3510024H1 


2635 


2917 


6 


244366.6 


1 97544 1F6 


2938 


3306 


6 


244366.6 


2189605H1 


2938 


3203 


6 


244366.6 


1975441H1 


2938 


3088 


6 


244366.6 


3470739H1 


2938 


3153 


6 


244366.6 


g 1844965 


2643 


2917 


6 


244366.6 


1623474H1 


2155 


2382 


6 


244366.6 


1338343H1 


1799 


2055 


6 


244366.6 


2022520H1 


2157 


2423 


6 


244366.6 


805131 HI 


2200 


2397 


6 


244366.6 


795024H1 


2202 


2393 


6 


244366.6 


1 338343F6 


1799 


2241 


6 


244366.6 


1297158H1 


1810 


2050 


6 


244366.6 


3354386H1 


2223 


2491 


6 


244366.6 


2540345H] 


2224 


2461 


6 


244366.6 


2313137H1 


1887 


2152 


6 


244366.6 


2805024H1 


2233 


2536 


6 


244366.6 


2454581 T6 


2281 


2827 


6 


244366.6 


g570404 


1961 


2245 


6 


244366.6 


2773281 HI 


2282 


2528 


6 


244366.6 


3246058T6 


2284 


2807 


6 


244366.6 


3332425T6 


2286 


2817 


6 


244366.6 


3321733H1 


1988 


2109 


6 


244366.6 


92153937 


2324 


2754 


6 


244366.6 


1464642H1 


1995 


2224 


6 


244366.6 


g 131 9564 


2331 


2938 


6 


244366.6 


3555988H1 


2005 


2304 


6 


244366.6 


3384030H1 


2043 


2317 


6 


244366.6 


g 1898453 


2332 


2760 


6 


244366.6 


g 1062443 


2337 


2746 


6 


244366.6 


3188982H1 


2348 


2686 
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TABLE 4 



SEQ ID NO: 


Template ID 


ComDonent ID 


Start 


Stop 


6 


244366.6 


2255759H1 


2045 


2318 


6 


244366.6 


2n2529Hl 


2355 


2621 


6 


244366.6 


3693731 HI 


2364 


2662 


6 


244366.6 


1864803H1 


2069 


2351 


6 


244366.6 


853003T6 


2385 


2815 


6 


244366.6 


gl925211 


2137 


2615 


6 


244366.6 


2641982H1 


2385 


2598 


6 


244366.6 


2516658H1 


2397 


2535 


6 


244366.6 


1864803T6 


2414 


2898 


6 


244366.6 


1338343T6 


2421 


2814 


6 


244366.6 


1398258H1 


2440 


2681 


6 


244366.6 


1829874H1 


2471 


2738 


6 


244366.6 


5891 37H1 


2478 


2685 


6 


244366.6 


4855638H1 


2147 


2411 


6 


244366.6 


1698592H1 


2500 


2726 


6 


244366.6 


3524562H1 


2154 


2348 


6 


244366.6 


gl319504 


2523 


2952 


6 


244366.6 


1 623474F6 


2155 


2682 


7 


4053"! 3.4 


4640462H1 


573 


837 


7 


4C5313.4 


g 1774849 


595 


979 


7 


4D5313.4 


4721077H1 


54 


T94 


7 


405313.4 


5944975H1 


61 


370 


7 


405313.4 


1948647H1 


596 


828 


7 


405313.4 


1 59201 6H1 


86 


282 


7 


405313.4 


1948647R6 


596 


1 136 


7 


405313.4 


g4070751 


686 


1 137 


7 


405313.4 


2384959H1 


86 


263 


7 


405313.4 


4571373H1 


715 


978 


7 


405313 4 


g954058 


893 


1203 


7 


405313.4 


1559555H1 


903 


1 120 


7 


405313 4 


1559555F6 


903 


1363 


7 


405313.4 


g6 17633 


914 


1316 


7 


405313.4 


1302977H1 


86 


257 


7 


405313 4 


4215272H1 


919 


1 195 


7 


405313.4 


g 1492868 


88 


230 


7 


405313 4 


464381 5H1 


962 


1212 


7 


405313.4 


965308H1 


968 


1255 


7 


405313.4 


965308R1 


968 


1622 


7 


405313.4 


1321926T6 


132 


483 


7 


405313.4 


5136028H1 


993 


1266 


7 


405313.4 


g4264253 


155 


609 


7 


405313.4 


g3739298 


156 


610 


7 


405313.4 


43061 78H1 


1008 


1207 


7 


405313.4 


g 1237752 


1009 


1175 


7 


405313.4 


4551446H1 


1047 


1310 


7 


405313.4 


g4522654 


210 


522 


7 


405313.4 


1628853H1 


1049 


1219 


7 


405313.4 


1627193H1 


1049 


1261 


7 


405313.4 


131 6291 HI 


349 


522 


7 


405313.4 


1628853F6 


1049 


1649 
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TABLE 4 



SEQ ID NO. 


Temolate ID 


Component ID 


Start 


Stop 


7 


405313.4 


1283759H1 


1053 


1330 


7 


405313.4 


4312209H1 


1067 


1367 


7 


405313.4 


4103086H1 


1115 


1239 


7 


405313.4 


g 1984348 


1147 


1472 


7 


405313.4 


92540589 


1168 


1616 


7 


405313.4 


g 1492809 


369 


527 


7 


405313.4 


3584223H1 


1177 


1354 


7 


405313.4 


102569H1 


458 


607 


7 


405313.4 


4829437H1 


540 


737 


7 


405313.4 


3674811 HI 


1186 


1495 


7 


405313.4 


1733325H1 


1192 


1412 


7 


405313.4 


g 1984560 


561 


804 


7 


405313.4 


g2010449 


2002 


2335 


7 


405313.4 


268271 IHl 


2014 


2309 


7 


405313.4 


g4535531 


2016 


2337 


7 


405313.4 


g 1773873 


2022 


2340 


7 


405313.4 


g41 37010 


2025 


2346 


7 


405313.4 


4ni488Hl 


2026 


2288 


- 7 


405313 4 


2153069H1 


2034 


2315 


7 


405313 4 


870657R1 


2041 


2613 


7 


405313,4 


870657H1 


2041 


2250 


7 


405313,4 


5433876H1 


2063 


2317 


7 


405313.4 


g3 162264 


2073 


2338 


7 


405313.4 


g3057393 


2080 


2337 


7 


405313.4 


g3872586 


2081 


2335 


7 


405313.4 


2081907T6 


2084 


2288 


7 


405313.4 


659926H1 


2097 


2337 


7 


405313.4 


056422H1 


2097 


2317 


7 


405313.4 


3486469H1 


2105 


2337 


7 


405313.4 


817086R1 


2127 


2337 


7 


405313.4 


817086H1 


2127 


2388 


7 


405313.4 


817086T1 


2127 


2280 


7 


405313.4 


g3597649 


2142 


2335 


7 


405313.4 


3988965H1 


2205 


2499 


7 


405313.4 


2664989H1 


1633 


1850 


7 


405313.4 


3962192H1 


1645 


1776 


7 


405313.4 


2290557H1 


1645 


1921 


7 


405313.4 


4n5475Hl 


1647 


1861 


7 


405313.4 


9670108 


1649 


1960 


7 


405313,4 


g570685 


1650 


1964 


7 


405313.4 


2213032H1 


1653 


1919 


7 


405313,4 


1667188H1 


1653 


1775 


7 


405313,4 


2285586H1 


1653 


1841 


7 


405313.4 


990792H1 


1669 


1968 


7 


405313.4 


1283707T6 


1682 


2306 


7 


405313.4 


3781966H1 


1686 


2020 


7 


405313.4 


2402093H1 


1700 


1946 


7 


405313.4 


3666248H1 


1707 


1866 


7 


405313.4 


1948647T6 


1711 


2289 


7 


405313.4 


5681549H1 


1746 


2012 
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TABLE 4 



ID NO: 


Template ID 


Component ID 


Start 


Stop 


7 


405313.4 


2885455T6 


1752 


2292 


7 


405313.4 


1301039H1 


1790 


2095 


7 


405313.4 


16691 80T6 


1799 


2303 


7 


405313.4 


1669180H1 


1808 


2038 


7 


405313.4 


3436329H1 


1815 


1956 


7 


405313.4 


g 1624740 


1828 


2218 


7 


405313.4 


1559555T6 


1831 


2302 


7 


405313.4 


3106278H1 


1846 


2121 


7 


405313.4 


2323323T6 


1848 


2308 


7 


405313.4 


1914641H1 


1848 


2062 


7 


405313.4 


1628853T6 


1854 


2300 


7 


405313.4 


g2946487 


1855 


2335 


7 


405313.4 


2081907F6 


1854 


2313 


7 


405313.4 


2081917H1 


1854 


2015 


7 


405313.4 


g672957 


1859 


2199 


7 


405313.4 


29161 79H1 


1878 


2177 


7 


405313.4 


3556793T6 


1881 


2306 


7 


405313.4 


g2783325 


1885 


2345 


7 


405313.4 


1268187F1 


1888 


2344 


7 


405313.4 


1268187H1 


1888 


2152 


7 


405313.4 


1268187F6 


1888 


2203 


7 


405313.4 


1268187T6 


1890 


2317 


7 


405313.4 


g6 16527 


1892 


2244 


7 


405313.4 


g3593850 


1896 


2335 


7 


405313.4 


9573001 


1897 


2264 


7 


405313.4 


9815353 


1898 


2253 


7 


405313.4 


94083770 


1902 


2335 


7 


405313.4 


94281927 


1912 


2337 


7 


405313.4 


92753877 


1930 


2345 


7 


405313.4 


219921 IHl 


1930 


2188 


7 


405313.4 


2375908H1 


1934 


2181 


7 


405313.4 


93797979 


1954 


2337 


7 


405313.4 


265331 2H1 


1963 


2221 


7 


405313.4 


g4265408 


1967 


2342 


7 


405313.4 


93919084 


1969 


2335 


7 


405313.4 


g4085496 


1970 


2334 


7 


405313.4 


g668546 


1976 


2156 


7 


405313.4 


2149388H1 


1985 


2274 


7 


405313.4 


2601556H1 


1995 


2280 


7 


405313.4 


92789326 


2898 


3179 


7 


405313.4 


9646138 


2913 


3179 


7 


405313.4 


9888680 


2916 


3206 


7 


405313.4 


9646137 


2924 


3179 


7 


405313.4 


9645108 


2927 


3179 


7 


405313.4 


9917579 


2933 


3178 


7 


405313.4 


93051580 


2943 


3179 


7 


405313.4 


93764150 


2943 


3179 


7 


405313.4 


92903435 


2947 


3179 


7 


405313.4 


g 1225232 


2947 


3179 


7 


405313.4 


g 1087854 


2947 


3111 
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TABLE 4 



ID NO: 


Tempiate ID 


Component ID 


Starr 


Stop 


7 


405313.4 


gl647814 


2998 


3213 


7 


405313.4 


g9 17686 


3058 


3210 


7 


405313.4 


1 226453T1 


3087 


3168 


7 


405313.4 


1226453 HI 


3087 


3167 


7 


405313.4 


g3 174481 


3089 


3178 


7 


405313.4 


g646728 


2878 


3179 


7 


405313.4 


g884856 


2884 


321 


7 


405313.4 


g8 15354 


2895 


3216 


7 


405313.4 


2714769H1 


2231 


2345 


7 


405313.4 


1 44091 4H1 


2252 


2421 


7 


405313.4 


1440914F6 


2252 


2675 


7 


405313.4 


431 1940H1 


2263 


2546 


7 


405313.4 


6073069H1 


2266 


2498 


7 


4053 1 3.4 


549605H1 


2285 


2554 




mUvJO I O 


2157285H1 


2287 


2523 




t+UOO 1 O.M 


23231 79H1 


2324 


2460 


7 


405313.4 


2323108H1 


2324 


2581 




4053 13 4 


g2 197270 


2337 


2696 




40531 3 4 


2294826H1 


2444 


251 7 




/nc;'5T /I 
AUOo i o.a 


g t u^owz 








40531 3 4 


38601 lOHl 


2444 


2682 




•4U0J 1 o.^ 


"50/1 AO'>Ot-n 








4UOo 1 O.A 


i yoo I /on t 


2480 


2773 






t zooon 1 


2495 


2794 




405313.4 


g770052 


2623 


2930 






1415784H1 


2660 


2922 


7 


405313.4 


g884855 


2666 


3025 




/inR^i /I 

^UOO 1 O .M 


20141 71 HI 








405313.4 


g888679 


2667 


3021 




^UoO 1 o.** 


2224791 HI 


2686 


2955 


7 


405313.4 


4106915H1 


2695 


2996 


7 


405313.4 


g22 17789 


2772 


3181 


7 


405313.4 


g2874275 


2772 


3179 


7 


405313.4 


1440914R1 


2773 


3179 


7 


405313.4 


g4075424 


2775 


3179 


7 


405313 4 


g4328896 


2776 


3179 


7 


405313.4 


287748H1 


2778 


3143 


7 


405313.4 


3496950H 1 


2802 


3087 


7 


405313.4 


2750652H1 


2806 


3105 


7 


405313.4 


g 7657 74 


2820 


3182 


7 


405313.4 


g3895056 


2831 


3179 


7 


405313.4 


g4450984 


2833 


3179 


7 


405313.4 


g2099917 


2853 


3348 


7 


405313.4 


g2018248 


2860 


2990 


7 


405313.4 


9564562 


2869 


3179 


7 


405313.4 


g2459191 


2869 


3206 


7 


405313.4 


g 1099005 


2521 


2798 


7 


405313.4 


2731146H1 


2559 


2794 


7 


405313.4 


gl 198836 


2578 


2840 


7 


405313.4 


2229784H1 


2603 


2852 
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TABLED 



ID NO: 


Template ID 


Component ID 


Start 


Stop 


7 


405313.4 


2229548H1 


2603 


2858 


7 


405313.4 


1321926H1 


1 


235 


7 


405313.4 


1321926F6 


1 


388 


7 


405313.4 


1337104H1 


8 


265 


7 


405313.4 


26821 13H1 


8 


281 


7 


405313.4 


g2754255 


1191 


1654 


7 


405313.4 


4541436H1 


1206 


1470 


7 


405313.4 


92558152 


1209 


1673 


7 


405313.4 


g2540638 


1218 


1616 


7 


405313.4 


1282095H1 


1225 


1484 


7 


405313.4 


1283707H1 


1225 


1511 


7 


405313.4 


1282048H1 


1225 


1506 


7 


405313.4 


1283707F6 


1226 


1659 


7 


405313.4 


3280293H1 


1234 


1511 


7 


405313.4 


5398765H1 


1241 


1391 


7 


405313.4 


5072115H1 


1293 


1556 


7 


405313.4 


5219645H1 


1319 


1443 


7 


405313.4 


1893001 HI 


1317 


1616 


7 


405313.4 


3696826H1 


1319 


1614 


7 


405313.4 


59201 92H1 


1319 


1647 


7 


405313.4 


g3770003 


1322 


1609 


7 


405313.4 


5665259H1 


1327 


1555 


7 


405313.4 


3151495H1 


1326 


1625 


7 


405313.4 


3357256H2 


1345 


1500 


7 


405313.4 


692886H1 


1353 


1604 


7 


405313.4 


5400503H1 


1388 


1623 


7 


405313.4 


2323323H1 


1395 


1657 


7 


405313.4 


2323323R6 


1395 


1895 


7 


405313.4 


46223 12H1 


1404 


1718 


7 


405313.4 


5373496H1 


1438 


1700 


7 


405313.4 


624120H1 


1456 


1728 


7 


405313.4 


2535460H1 


1469 


1750 


7 


405313.4 


2115405H1 


1505 


1801 


7 


405313.4 


g954059 


1516 


1748 


7 


405313.4 


2292561 HI 


1538 


1790 


7 


405313.4 


2556287H1 


1580 


1842 


7 


405313.4 


g866975 


1610 


1953 


7 


405313.4 


86191 IRl 


1619 


2200 


7 


405313.4 


8619nHl 


1619 


1876 


7 


405313.4 


g873285 


1626 


2006 


8 


436857.2 


232864F1 


1467 


1959 


8 


436857.2 


2704880T6 


1506 


1925 


8 


436857.2 


g4373224 


1522 


1968 


8 


436857.2 


94112872 


1522 


1954 


8 


436857.2 


g4390509 


1556 


1962 


8 


436857.2 


5858881 HI 


1624 


1888 


8 


436857.2 


5267222H1 


1653 


1883 


8 


436857.2 


1477850T6 


1680 


2225 


8 


436857.2 


4617960T6 


1761 


2215 


8 


436857.2 


9917596 


1901 


2223 
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TABLE 4 



SEQ ID NO: 






Start 




s 


436857.2 


32701 04H1 


1906 


2161 


8 


436857.2 


5782044 H 1 


2003 


2209 


g 


436857.2 


g4268407 


2010 


2240 


8 


436857.2 


g3051680 


2063 


2241 


8 


436857.2 


481468H1 


2063 


2250 


8 


436857.2 


56831 78H1 


1 


212 


8 


436857.2 


1 477850H 1 


56 


278 


8 


436857.2 


1477850F6 


56 


488 




436857.2 


4619212H1 


138 


290 


8 


436857.2 


g 1 978924 


207 


556 


g 


436857.2 


3758509H1 


206 


498 


8 


436857.2 


4617960H1 


323 


550 




mOOOQ/ 


m \ oto 1 n \ 


323 


555 




^oOuO/ -Z 


461 7960F6 


323 


761 






t VTzyZ'-*n 1 








40000/.Z 


^/i9AonAn 
yazoyuou 






8 


436857.2 


4ZDOOVUn 1 






8 


4j6ob/.2 


4/0 1 / /Un 1 


797 

727 


1004 


8 


436857.2 


46 1 3 1 06H 1 






8 


436857.2 


gzJUU/oy 


70^ 

795 


in9- 


8 


436857.2 


2704880H1 








/nAfll^T 9 

moooo/ .z 


Z/ LKiooUrO 


883 


1313 


8 


436857.2 


2707669H1 


961 


1264 


8 


436857.2 


a05609H1 


1054 


1283 


8 


436857.2 


805609R1 


1054 


1630 


8 


436857.2 


4135963H1 


1113 


1415 


8 


436857.2 


4294249H1 


1152 


1398 


8 


436857.2 


5450335H1 


1156 


1421 


8 


436857.2 


5373082H1 


1203 


1419 


8 


436857.2 


4190554H1 


1247 


1412 
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CLAIMS 




consisting of: 

a) a polynucleotide sequence selected from the group consisting of SEQ ID NO: 1-14. 

b) a naturally occurring polynucleotide sequence having at least 90% sequence identity to a 
polynucleotide sequence selected from the group consisting of SEQ ID NO; 1-14, 

c) a polynucleotide sequence complementary to a), 

d) a polynucleotide sequence complementary to b). and 

e) an RNA equivalent of a) through d). 

\ / 

2. An isolated polynucleotide of claim 1, comprising a polynucleotide sequence selected 
from the group consisting of SEQ ID NO: 1-14.~ 

3. An isolated polynucleotide comprising at least 60 contiguous nucleotides of a 
polynucleotide of clai^ . 

4. A composition for the detection of expression of disease detection and treatment molecule 
polynucleotides comprising at least one of the polynucleotides of ^laim 1 and a detectable label. 



5. A method for detecting a target polynucleotide in a sample, said target polynucleotide 
having a sequence of a polynucleotide of clai^ 1, the method comprising: 



a) amplifying said target polynucleotide or fragment thereof using polymerase chain reaction 
amplification, and 

b) detecting the presence or absence of said amplified target polynucleotide or fragment 
thereof, and, optionally, if present, the amount thereof. 

6. A method for detecting a target polynucleotide in a sample, said target polynucleotide 
comprising a sequence of a polynucleotide of 6^im 1, the method comprising: 




a) hybridizing the sample with a probe comprising at least 20 contiguous nucleotides 
comprising a sequence complementary to said target polynucleotide in the sample, and which probe 
specifically hybridizes to said target polynucleotide, under conditions whereby a hybridization 
complex is formed between said probe and said target polynucleotide, and 
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b) detecting the presence or absence of said hybridization complex, and. optionally, if 
present, the amount thereof. 

7. A method of claip 5, wherein the probe comprises at least 30 contiguous nucleotides. 

5 

8. A method of clail^ti 5. wherein the probe comprises at least 60 contiguous nucleotides. 

9. A recombi||Tant polynucleotide comprising a promoter sequence operably linked to a 
polynucleotide of claim 1 . 

10 , 

10. A cell transformed with a recombinant polynucleotide of claim 9. 

1 1. A transgenic organism comprising a recombinant polynucleotide of claim 9. 

15 12. A method for producing a disease detection and treatment molecule polypeptide, the 

method comprising: 

a) culturing a cell under conditions suitable for expression of the disease detection and 
treatment molecule polypeptide, wherein said ceil is transformed with a recombinant polynucleotide 
of clarni 9, and 

20 b) recovering the disease detection and treatment molecule polypeptide so expressed. 

13. A purified disease detection and treatment molecule polypeptide (MDDT) encoded by at 
least one of the polynucleotides of claipt 2. 

25 14. An isolated antibody which specifically binds to a disease detection and treatment 

molecule polypeptide of claim'13. 

15. A method of identifying a test compound which specifically binds to the disease 
detection and treatment molecule polypeptide of cla}j^3. the method comprising the steps of: 
30 a) providing a test compound; 

b) combining the disease detection and treatment molecule polypeptide with the test 
compound for a sufficient time and under suitable conditions for binding; and 

c) detecting binding of the disease detection and treatment molecule polypeptide to the 
test compound, thereby identifying the test compound which specifically binds the disease detection 

3 5 and treatment molecule polypeptide. 
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16. A microarray wherein at least one element of the microarray is a polynucleotide of claim 



17. A method for generating a transcript image of a sample which contains polynucleotides, 
the method comprising the steps of: 

a) labeling the polynucleotides of the sample. 

b) contacting the elements of the microarray of claim 16 with the labeled polynucleotides of 
the sample under conditions suitable for the formation of a hybridization complex, and 

c) quantifying the expression of the polynucleotides in the sample. 

18. A method for screening a compound for effectiveness in altering expression of a target 
polynucleotide, wherein said target polynucleotide comprises a polynucleotide sequence of claim 1. 
the method comprising: 

a) exposing a sample comprising the target polynucleotide to a compound, and 

b) detecting altered expression of the target polynucleotide. 

19. A method of claim 6 for toxicity testing of a compound, further comprising 

(c) comparing the presence, absence or amount of said target polynucleotide in a first 
biological sample and a second biological sample, wherein said first biological sample has been 
contacted with said compound, and said second sample is a control, whereby a change in presence, 
absence or amount of said target polynucleotide in said first sample, as compared with said second 
sample, is indicative of toxic response to said compound. 

20. A method for screening a compound for effectiveness in altering expression of a target 
polynucleotide, wherein said target polynucleotide comprises a polynucleotide sequence of claim 1, 
the method comprising: 

a) exposing a sample comprising the target polynucleotide to a compound, under conditions 
suitable for the expression of the target polynucleotide, 

b) detecting altered expression of the target polynucleotide, and 

c) comparing the expression of the target polynucleotide in the presence of varying amounts 
of the compound and in the absence of the compound. 



21. A method for assessing toxicity of a test compound, said method comprising: 

a) treating a biological sample containing nucleic acids with the test compound ; 

b) hybridizing the nucleic acids of the treated biological sample with a probe comprising at 
least 20 contiguous nucleotides of a polynucleotide of claim 1 under conditions whereby a specific 
hybridization complex is formed between said probe and a target polynucleotide in the biological 
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sample, said target polynucleotide comprising a polynucleotide sequence of a polynucleotide of 
claim 1 or fragment thereof; 

c) quantifying the amount of hybridization complex: and 

d) comparing the amount of hybridization complex in the treated biological sample with the 
amount of hybridization complex in an untreated biological sample, wherein a difference in the 
amount of hybridization complex in the treated biological sample is indicative of toxicity of the test 
compound. 

22. An array comprising different nucleotide molecules affixed in distinct physical locations 
on a solid substrate, wherein at least one of said nucleotide molecules comprises a first 
oligonucleotide or polynucleotide sequence specifically hybridizable with at least 30 contiguous 
nucleotides of a target polynucleotide, said target polynucleotide having a sequence of claim 1. 

23. An array of claim 22. wherein said first oligonucleotide or polynucleotide sequence is 
completely complementary to at least 30 contiguous nucleotides of said target polynucleotide. 

24. An array of claim 22. wherein said first oligonucleotide or polynucleotide sequence is 
completely complementary to at least 60 contiguous nucleotides of said target polynucleotide. 

25. An array of claim 22. which is a microarray. 

26. An array of claim 22. further comprising said target polynucleotide hybridized to said 
first oligonucleotide or polynucleotide. 

27. An array of claim 22. wherein a linker joins at least one of said nucleotide molecules to 
said solid substrate. 

28. An array of claim 22. wherein each distinct physical location on the substrate contains 
multiple nucleotide molecules having the same sequence, and each distinct physical location on the 
substrate contains nucleotide molecules having a sequence which differs from the sequence of 
nucleotide molecules at another physical location on the substrate. 

29. A polypeptide encoded by the polynucleotide of SEQ ID NO:l. 

30. A polypeptide encoded by the polynucleotide of SEQ ID NO;2. 
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31. A polypeptide encoded by the polynucleotide of SEQ ID NO:3. 

32. A polypeptide encoded by the polynucleotide of SEQ ID NO:4. 

33. A polypeptide encoded by the polynucleotide of SEQ ID NO:5. 

34. A polypeptide encoded by the polynucleotide of SEQ ID NO:6. 

35. A polypeptide encoded by the polynucleotide of SEQ ID NO:7. 

36. A polypeptide encoded by the polynucleotide of SEQ ID NO:8. 

37. A polypeptide encoded by the polynucleotide of SEQ ID NO: 9. 

38. A polypeptide encoded by the polynucleotide of SEQ ID NO: 10. 

39. A polypeptide encoded by the polynucleotide of SEQ ID NO: 1 1. 

40. A polypeptide encoded by the polynucleotide of SEQ ID NO: 12. 

41. A polypeptide encoded by the polynucleotide of SEQ ID NO: 13. 

42. A polypeptide encoded by the polynucleotide of SEQ ID NO: 14. 

43. A polynucleotide of claim 1. comprising the polynucleotide sequence of SEQ ID NO: 1. 

44. A polynucleotide of claim 1, comprising the polynucleotide sequence of SEQ ID NO:2. 

45. A polynucleotide of claim 1, comprising the polynucleotide sequence of SEQ ID NO:3. 

46. A polynucleotide of claim 1. comprising the polynucleotide sequence of SEQ ED NO:4. 

47. A polynucleotide of claim 1, comprising the polynucleotide sequence of SEQ ID NO:5. 

48. A polynucleotide of claim 1, comprising the polynucleotide sequence of SEQ ID NO:6. 
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49. A polynucleotide of claim 1. comprising the polynucleotide sequence of SEQ ID NO: 7 

50. A polynucleotide of claim 1. comprising the polynucleotide sequence of SEQ ID NO:8. 

51. A polynucleotide of claim 1. comprising the polynucleotide sequence of SEQ ID NO:9. 

52. A polynucleotide of claim 1, comprising the polynucleotide sequence of SEQ ID NO: 10. 

53. A polynucleotide of claim L comprising the polynucleotide sequence of SEQ ID NO: 11. 

54. A polynucleotide of claim 1. comprising the polynucleotide sequence of SEQ ID NO: 12. 

55. A polynucleotide of claim 1. comprising the polynucleotide sequence of SEQ ID NO: 13. 

56. A polynucleotide of claim 1. comprising the polynucleotide sequence of SEQ ID NO: 14. 
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the specification of which: 
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application in accordance with Title 37, Code of Federal Regulations, § 1.56(a). 
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